August 1983 | RICHARD D. SCHLICHTING and FRED B. SCHNEIDER
The paper presents a methodology for designing fault-tolerant computing systems based on the concept of a fail-stop processor, which automatically halts in response to internal failures. The authors address the challenge of implementing processors that behave like fail-stop processors with high probability and describe axiomatic program verification techniques for developing provably correct programs for these processors. The design of a process control system is used as an example to illustrate the application of the methodology. The paper also discusses the implementation of fail-stop processors using current hardware, the development of recovery protocols, and the termination and response time considerations in fault-tolerant programs. Additionally, it compares the proposed approach with other methods for designing fault-tolerant systems and concludes with a discussion on the practicality and effectiveness of the fail-stop processor concept.The paper presents a methodology for designing fault-tolerant computing systems based on the concept of a fail-stop processor, which automatically halts in response to internal failures. The authors address the challenge of implementing processors that behave like fail-stop processors with high probability and describe axiomatic program verification techniques for developing provably correct programs for these processors. The design of a process control system is used as an example to illustrate the application of the methodology. The paper also discusses the implementation of fail-stop processors using current hardware, the development of recovery protocols, and the termination and response time considerations in fault-tolerant programs. Additionally, it compares the proposed approach with other methods for designing fault-tolerant systems and concludes with a discussion on the practicality and effectiveness of the fail-stop processor concept.