Parallel and Heterogeneous Timing Analysis: Partition, Algorithm, and System

Parallel and Heterogeneous Timing Analysis: Partition, Algorithm, and System

March 12-15, 2024 | Tsung-Wei Huang, Boyang Zhang, Dian-Lun Lin, and Cheng-Hsiang Chiu
This paper presents a comprehensive approach to enhancing static timing analysis (STA) through parallel and heterogeneous computing techniques. The authors introduce task-based parallelism, task graph partitioning, and GPU kernel algorithms to improve the performance of STA applications. They also develop a task-parallel programming system to generalize their solutions for broader scientific computing applications. The paper discusses four key strategies: (1) task-parallel STA algorithms to improve timing propagation in graph-based analysis (GBA), (2) task graph partitioning to enhance the scheduling performance of task-parallel STA algorithms, (3) GPU-accelerated STA algorithms to speed up time-consuming path-based analysis (PBA) applications, and (4) a task-parallel programming system to generalize their approach for broader applications beyond STA. The authors demonstrate that task-parallel strategies outperform loop-based parallelism in terms of runtime scalability and performance. They also introduce a CPU-parallel partitioning algorithm, C-PASTA, which significantly improves the performance of task-parallel STA algorithms by minimizing the impact on the original timing graph parallelism. For GPU-accelerated STA algorithms, the authors propose a GPU kernel-based approach for path enumeration, which significantly improves the performance of path-based timing analysis. Their GPU-accelerated PBA algorithm achieves substantial speedups over CPU-based baselines, particularly for large designs. The authors also present a task-parallel programming system called Taskflow, which generalizes their solutions to benefit broader applications beyond STA. Taskflow supports both static and dynamic task graph programming models, enabling efficient parallel programming for a wide range of applications, including machine learning, hardware fuzzing, logic simulation, quantum computing, and physical design. The paper concludes that leveraging CPU-GPU heterogeneous parallelism can significantly enhance the performance of STA. The authors hope that their work will inspire more solutions for improving the performance of STA.This paper presents a comprehensive approach to enhancing static timing analysis (STA) through parallel and heterogeneous computing techniques. The authors introduce task-based parallelism, task graph partitioning, and GPU kernel algorithms to improve the performance of STA applications. They also develop a task-parallel programming system to generalize their solutions for broader scientific computing applications. The paper discusses four key strategies: (1) task-parallel STA algorithms to improve timing propagation in graph-based analysis (GBA), (2) task graph partitioning to enhance the scheduling performance of task-parallel STA algorithms, (3) GPU-accelerated STA algorithms to speed up time-consuming path-based analysis (PBA) applications, and (4) a task-parallel programming system to generalize their approach for broader applications beyond STA. The authors demonstrate that task-parallel strategies outperform loop-based parallelism in terms of runtime scalability and performance. They also introduce a CPU-parallel partitioning algorithm, C-PASTA, which significantly improves the performance of task-parallel STA algorithms by minimizing the impact on the original timing graph parallelism. For GPU-accelerated STA algorithms, the authors propose a GPU kernel-based approach for path enumeration, which significantly improves the performance of path-based timing analysis. Their GPU-accelerated PBA algorithm achieves substantial speedups over CPU-based baselines, particularly for large designs. The authors also present a task-parallel programming system called Taskflow, which generalizes their solutions to benefit broader applications beyond STA. Taskflow supports both static and dynamic task graph programming models, enabling efficient parallel programming for a wide range of applications, including machine learning, hardware fuzzing, logic simulation, quantum computing, and physical design. The paper concludes that leveraging CPU-GPU heterogeneous parallelism can significantly enhance the performance of STA. The authors hope that their work will inspire more solutions for improving the performance of STA.
Reach us at info@study.space
[slides] Parallel and Heterogeneous Timing Analysis%3A Partition%2C Algorithm%2C and System | StudySpace