Understanding Reliable communication in the presence of failures

The paper presents a set of communication primitives designed to support distributed computations in environments where failures can occur. The focus is on *halting failures*, where a process stops executing without performing incorrect actions. The primitives aim to achieve high concurrency while respecting application-specific delivery ordering constraints. The key contributions include: 1. **Fault-Tolerant Process Groups**: The system ensures that processes within a fault-tolerant process group observe consistent event orderings, including failures, recoveries, migration, and dynamic changes to group properties. 2. **Reliable Multicast Protocols**: A family of reliable multicast protocols is introduced, which can be used in both local and wide-area networks. These protocols support causal delivery orderings, providing a valuable alternative to conventional asynchronous communication protocols. 3. **Group Broadcast Primitive (GBCAST)**: GBCAST is used to inform operational group members about failures, recoveries, and changes in group properties. It ensures that all members have the same view of the group's state. 4. **Atomic Broadcast Primitive (ABCAST)**: ABCAST is used for replicated data structures, ensuring that operations are performed in the same order at all destinations. 5. **Causal Broadcast Primitive (CBCAST)**: CBCAST enforces a delivery ordering when desired, respecting potential causality between broadcasts. It is designed to maximize concurrency and asynchrony without compromising correctness. 6. **Implementation Details**: The paper discusses the implementation of these primitives in a local network, including the intersite layer, site view management, and garbage collection mechanisms. The approach is demonstrated through the ISIS system, which supports fault-tolerant resilient objects and bulletin boards, illustrating the simplification of higher-level algorithms achieved by the proposed communication primitives.The paper presents a set of communication primitives designed to support distributed computations in environments where failures can occur. The focus is on *halting failures*, where a process stops executing without performing incorrect actions. The primitives aim to achieve high concurrency while respecting application-specific delivery ordering constraints. The key contributions include: 1. **Fault-Tolerant Process Groups**: The system ensures that processes within a fault-tolerant process group observe consistent event orderings, including failures, recoveries, migration, and dynamic changes to group properties. 2. **Reliable Multicast Protocols**: A family of reliable multicast protocols is introduced, which can be used in both local and wide-area networks. These protocols support causal delivery orderings, providing a valuable alternative to conventional asynchronous communication protocols. 3. **Group Broadcast Primitive (GBCAST)**: GBCAST is used to inform operational group members about failures, recoveries, and changes in group properties. It ensures that all members have the same view of the group's state. 4. **Atomic Broadcast Primitive (ABCAST)**: ABCAST is used for replicated data structures, ensuring that operations are performed in the same order at all destinations. 5. **Causal Broadcast Primitive (CBCAST)**: CBCAST enforces a delivery ordering when desired, respecting potential causality between broadcasts. It is designed to maximize concurrency and asynchrony without compromising correctness. 6. **Implementation Details**: The paper discusses the implementation of these primitives in a local network, including the intersite layer, site view management, and garbage collection mechanisms. The approach is demonstrated through the ISIS system, which supports fault-tolerant resilient objects and bulletin boards, illustrating the simplification of higher-level algorithms achieved by the proposed communication primitives.

Reliable Communication in the Presence of Failures

February 1987 | KENNETH P. BIRMAN and THOMAS A. JOSEPH