1995 | Robert D. Blumofe, Christopher F. Joerg, Bradley C. Kuszmaul, Charles E. Leiserson, Keith H. Randall, Yuli Zhou
Cilk is a C-based runtime system for multithreaded parallel programming. This paper documents the efficiency of Cilk's work-stealing scheduler, both empirically and analytically. It shows that the "work" and "critical path" of a Cilk computation can accurately model performance, allowing programmers to focus on reducing these factors rather than load balancing. For "fully strict" programs, Cilk's scheduler achieves space, time, and communication bounds within a constant factor of optimal.
Cilk runs on several MPP and SMP systems, including the CM5, Paragon, Power Challenge, and MIT Phish. Applications include protein folding, graphic rendering, backtrack search, and the *Socrates chess program. The Cilk language extends C with continuation-passing style, enabling threads to communicate and synchronize. Threads are defined similarly to C functions, with closures holding arguments and join counters. Spawning is specified with the spawn keyword, while spawn_next creates successor threads.
The Cilk scheduler uses work-stealing, where a processor (thief) selects another (victim) to steal work. It maintains local ready queues, with closures grouped by level. When a thread spawns a child, the scheduler allocates a closure, initializes arguments, and posts it to the ready queue. Work-stealing is implemented with a simple request-reply protocol, ensuring efficient scheduling.
Empirical results show that Cilk performs well on dynamic, asynchronous, tree-like computations. Applications like fib, queens, pfold, ray, and *Socrates demonstrate high efficiency and speedup. Theoretical analysis proves that for "fully strict" programs, Cilk's scheduler achieves optimal space, time, and communication bounds. The work and critical path model allows predicting runtime accurately.
Cilk's performance is measured using work (T1) and critical path (T∞). The runtime on P processors is bounded by T1/P and T∞. Theoretical results show that expected execution time is O(T1/P + T∞), and communication is O(T∞PSmax). These bounds are proven for "fully strict" programs, ensuring efficient scheduling.
Cilk's model allows programmers to focus on work and critical path, independent of machine configuration. It provides guaranteed performance based on these measures. While Cilk excels in dynamic, asynchronous computations, it is not ideal for traditional parallel applications. Future work aims to extend Cilk's capabilities while preserving performance guarantees.Cilk is a C-based runtime system for multithreaded parallel programming. This paper documents the efficiency of Cilk's work-stealing scheduler, both empirically and analytically. It shows that the "work" and "critical path" of a Cilk computation can accurately model performance, allowing programmers to focus on reducing these factors rather than load balancing. For "fully strict" programs, Cilk's scheduler achieves space, time, and communication bounds within a constant factor of optimal.
Cilk runs on several MPP and SMP systems, including the CM5, Paragon, Power Challenge, and MIT Phish. Applications include protein folding, graphic rendering, backtrack search, and the *Socrates chess program. The Cilk language extends C with continuation-passing style, enabling threads to communicate and synchronize. Threads are defined similarly to C functions, with closures holding arguments and join counters. Spawning is specified with the spawn keyword, while spawn_next creates successor threads.
The Cilk scheduler uses work-stealing, where a processor (thief) selects another (victim) to steal work. It maintains local ready queues, with closures grouped by level. When a thread spawns a child, the scheduler allocates a closure, initializes arguments, and posts it to the ready queue. Work-stealing is implemented with a simple request-reply protocol, ensuring efficient scheduling.
Empirical results show that Cilk performs well on dynamic, asynchronous, tree-like computations. Applications like fib, queens, pfold, ray, and *Socrates demonstrate high efficiency and speedup. Theoretical analysis proves that for "fully strict" programs, Cilk's scheduler achieves optimal space, time, and communication bounds. The work and critical path model allows predicting runtime accurately.
Cilk's performance is measured using work (T1) and critical path (T∞). The runtime on P processors is bounded by T1/P and T∞. Theoretical results show that expected execution time is O(T1/P + T∞), and communication is O(T∞PSmax). These bounds are proven for "fully strict" programs, ensuring efficient scheduling.
Cilk's model allows programmers to focus on work and critical path, independent of machine configuration. It provides guaranteed performance based on these measures. While Cilk excels in dynamic, asynchronous computations, it is not ideal for traditional parallel applications. Future work aims to extend Cilk's capabilities while preserving performance guarantees.