[slides] Discretized streams%3A fault-tolerant streaming computation at scale

The paper introduces a new processing model called *Discretized Streams* (D-Streams), which addresses the challenges of fault tolerance and straggler tolerance in large-scale streaming applications. D-Streams break down streaming computations into a series of short, stateless, deterministic batch computations on small time intervals, using Resilient Distributed Datasets (RDDs) to store and recover state. This approach enables efficient recovery mechanisms, such as parallel recovery, which outperform traditional replication and upstream backup methods. D-Streams also support a rich set of operators, achieve high per-node throughput, linear scalability to 100 nodes, sub-second latency, and sub-second fault recovery. The system, implemented in Spark Streaming, can process over 60 million records per second on 100 nodes with sub-second latency and can handle faults and stragglers in sub-seconds. Additionally, D-Streams can seamlessly integrate with batch and interactive query models, enabling applications that combine these modes. The paper evaluates Spark Streaming using benchmark applications and real-world use cases, demonstrating its performance, scalability, and fault tolerance.The paper introduces a new processing model called *Discretized Streams* (D-Streams), which addresses the challenges of fault tolerance and straggler tolerance in large-scale streaming applications. D-Streams break down streaming computations into a series of short, stateless, deterministic batch computations on small time intervals, using Resilient Distributed Datasets (RDDs) to store and recover state. This approach enables efficient recovery mechanisms, such as parallel recovery, which outperform traditional replication and upstream backup methods. D-Streams also support a rich set of operators, achieve high per-node throughput, linear scalability to 100 nodes, sub-second latency, and sub-second fault recovery. The system, implemented in Spark Streaming, can process over 60 million records per second on 100 nodes with sub-second latency and can handle faults and stragglers in sub-seconds. Additionally, D-Streams can seamlessly integrate with batch and interactive query models, enabling applications that combine these modes. The paper evaluates Spark Streaming using benchmark applications and real-world use cases, demonstrating its performance, scalability, and fault tolerance.

Discretized Streams: Fault-Tolerant Streaming Computation at Scale

Nov. 3–6, 2013 | Matei Zaharia, Tathagata Das, Haoyuan Li, Timothy Hunter, Scott Shenker, Ion Stoica