[slides and audio] Impossibility of distributed consensus with one faulty process

The paper "Impossibility of Distributed Consensus with One Faulty Process" by Michael J. Fischer, Nancy A. Lynch, and Michael S. Paterson addresses the fundamental problem of reaching agreement among remote processes in an asynchronous system, where some processes may be unreliable. The authors prove that no completely asynchronous consensus protocol can tolerate even a single unannounced process death, even under the assumption that the message system is reliable. This means that any distributed commit protocol will have a "window of vulnerability" where the delay or inaccessibility of a single process can cause the entire algorithm to wait indefinitely. The proof relies on the fact that processes have no access to synchronized clocks and cannot detect the death of other processes. The paper also presents a partially correct consensus protocol that works as long as a majority of the processes are nonfaulty and no process dies during the execution of the protocol. This protocol involves two stages: constructing a directed graph and then computing the transitive closure to identify initial cliques, from which the processes can make decisions based on their initial values. The authors conclude that their results highlight the need for more refined models of distributed computing that better reflect realistic assumptions about processor and communication timings.The paper "Impossibility of Distributed Consensus with One Faulty Process" by Michael J. Fischer, Nancy A. Lynch, and Michael S. Paterson addresses the fundamental problem of reaching agreement among remote processes in an asynchronous system, where some processes may be unreliable. The authors prove that no completely asynchronous consensus protocol can tolerate even a single unannounced process death, even under the assumption that the message system is reliable. This means that any distributed commit protocol will have a "window of vulnerability" where the delay or inaccessibility of a single process can cause the entire algorithm to wait indefinitely. The proof relies on the fact that processes have no access to synchronized clocks and cannot detect the death of other processes. The paper also presents a partially correct consensus protocol that works as long as a majority of the processes are nonfaulty and no process dies during the execution of the protocol. This protocol involves two stages: constructing a directed graph and then computing the transitive closure to identify initial cliques, from which the processes can make decisions based on their initial values. The authors conclude that their results highlight the need for more refined models of distributed computing that better reflect realistic assumptions about processor and communication timings.

Impossibility of Distributed Consensus with One Faulty Process

April 1985 | MICHAEL J. FISCHER, NANCY A. LYNCH, AND MICHAEL S. PATERSON