[slides and audio] Dark silicon and the end of multicore scaling

This paper investigates the limits of multicore scaling by combining device scaling, single-core scaling, and multicore scaling to measure the speedup potential for parallel workloads across five technology generations. The study shows that multicore scaling is power-limited, with a significant portion of the chip being "dark silicon" (underutilized) even at advanced nodes like 22 nm and 8 nm. At 8 nm, over 50% of the chip is dark silicon, and only 7.9× average speedup is possible across commonly used parallel workloads, leaving a nearly 24-fold gap from a target of doubled performance per generation. The study considers various multicore chip organizations, including CPU-like and GPU-like designs with different topologies, and evaluates their performance under power and area constraints. The results highlight the need for radical microarchitectural innovations to achieve performance gains commensurate with Moore's Law. The paper also presents a detailed model of multicore performance, incorporating application behavior, microarchitectural features, and physical constraints. The study concludes that future multicore designs will face significant challenges due to power limitations, and that the performance gains from multicore scaling will be constrained by the available parallelism in applications. The results suggest that the gap between expected performance and actual performance will widen, and that the computing community must address these challenges to achieve the desired performance improvements.This paper investigates the limits of multicore scaling by combining device scaling, single-core scaling, and multicore scaling to measure the speedup potential for parallel workloads across five technology generations. The study shows that multicore scaling is power-limited, with a significant portion of the chip being "dark silicon" (underutilized) even at advanced nodes like 22 nm and 8 nm. At 8 nm, over 50% of the chip is dark silicon, and only 7.9× average speedup is possible across commonly used parallel workloads, leaving a nearly 24-fold gap from a target of doubled performance per generation. The study considers various multicore chip organizations, including CPU-like and GPU-like designs with different topologies, and evaluates their performance under power and area constraints. The results highlight the need for radical microarchitectural innovations to achieve performance gains commensurate with Moore's Law. The paper also presents a detailed model of multicore performance, incorporating application behavior, microarchitectural features, and physical constraints. The study concludes that future multicore designs will face significant challenges due to power limitations, and that the performance gains from multicore scaling will be constrained by the available parallelism in applications. The results suggest that the gap between expected performance and actual performance will widen, and that the computing community must address these challenges to achieve the desired performance improvements.

Dark Silicon and the End of Multicore Scaling

June 4–8, 2011 | Hadi Esmaeilzadeh, Emily Blem, Renée St. Amant, Karthikeyan Sankaralingam, Doug Burger