April 1988 | CYNTHIA DWORk AND NANCY LYNCH AND LARRY STOCKMEYER
This paper introduces the concept of partial synchrony in distributed systems, which lies between fully synchronous and fully asynchronous systems. In a fully synchronous system, there are known upper bounds on message delivery time and processor speed. In an asynchronous system, these bounds do not exist. In partial synchrony, these bounds may exist but are not known a priori, or they may be known but only guaranteed to hold after some unknown time T. The paper presents consensus protocols for various fault models in partially synchronous systems, including fail-stop, omission, and Byzantine faults. It also provides lower bounds showing that these protocols are optimal in terms of the number of faults they can tolerate. The protocols use fault-tolerant distributed clocks to allow processors to reach a common notion of time. The paper shows that for fail-stop or omission faults, t-resilient consensus is possible if N ≥ 2t + 1, and for Byzantine faults with authentication, t-resilient consensus is possible if N ≥ 3t + 1. For Byzantine faults without authentication, the same condition applies. The time required for consensus is polynomial in N and Δ for the unknown Δ model, and GST plus a polynomial in N and Δ for the GST model. The paper also discusses the implications of partial synchrony on processor and communication synchrony, and presents protocols for both cases. The results show that partial synchrony allows for higher resiliency than fully asynchronous systems but less than fully synchronous systems. The paper also discusses the use of distributed clocks to simulate time in partially synchronous systems.This paper introduces the concept of partial synchrony in distributed systems, which lies between fully synchronous and fully asynchronous systems. In a fully synchronous system, there are known upper bounds on message delivery time and processor speed. In an asynchronous system, these bounds do not exist. In partial synchrony, these bounds may exist but are not known a priori, or they may be known but only guaranteed to hold after some unknown time T. The paper presents consensus protocols for various fault models in partially synchronous systems, including fail-stop, omission, and Byzantine faults. It also provides lower bounds showing that these protocols are optimal in terms of the number of faults they can tolerate. The protocols use fault-tolerant distributed clocks to allow processors to reach a common notion of time. The paper shows that for fail-stop or omission faults, t-resilient consensus is possible if N ≥ 2t + 1, and for Byzantine faults with authentication, t-resilient consensus is possible if N ≥ 3t + 1. For Byzantine faults without authentication, the same condition applies. The time required for consensus is polynomial in N and Δ for the unknown Δ model, and GST plus a polynomial in N and Δ for the GST model. The paper also discusses the implications of partial synchrony on processor and communication synchrony, and presents protocols for both cases. The results show that partial synchrony allows for higher resiliency than fully asynchronous systems but less than fully synchronous systems. The paper also discusses the use of distributed clocks to simulate time in partially synchronous systems.