Understanding Allo%3A A Programming Model for Composable Accelerator Design

Allo is a composable programming model designed for efficient spatial accelerator design, addressing the challenges of constructing complex, high-performance accelerator architectures. It decouples hardware customizations (compute, memory, communication, and data type) from algorithm specifications, encapsulating them as customization primitives. Allo preserves the hierarchical structure of input programs by combining these customizations in a bottom-up, type-safe manner, facilitating holistic optimizations across function boundaries. The model supports reusable parameterized kernel templates and composable schedules, allowing users to construct modular hardware accelerators from the ground up. Allo also introduces a hierarchical dataflow graph to support the composition of multiple kernels while maintaining function boundaries, ensuring functional correctness and enhancing performance. Comprehensive experiments on PolyBench benchmarks and large neural networks demonstrate that Allo outperforms state-of-the-art HLS tools and ADLs, achieving a 1.7× speedup and 5.4× higher energy efficiency for the GPT2 model compared to the NVIDIA A100 GPU.Allo is a composable programming model designed for efficient spatial accelerator design, addressing the challenges of constructing complex, high-performance accelerator architectures. It decouples hardware customizations (compute, memory, communication, and data type) from algorithm specifications, encapsulating them as customization primitives. Allo preserves the hierarchical structure of input programs by combining these customizations in a bottom-up, type-safe manner, facilitating holistic optimizations across function boundaries. The model supports reusable parameterized kernel templates and composable schedules, allowing users to construct modular hardware accelerators from the ground up. Allo also introduces a hierarchical dataflow graph to support the composition of multiple kernels while maintaining function boundaries, ensuring functional correctness and enhancing performance. Comprehensive experiments on PolyBench benchmarks and large neural networks demonstrate that Allo outperforms state-of-the-art HLS tools and ADLs, achieving a 1.7× speedup and 5.4× higher energy efficiency for the GPT2 model compared to the NVIDIA A100 GPU.

Allo: A Programming Model for Composable Accelerator Design

June 2024 | HONGZHENG CHEN, NIANSONG ZHANG, SHAOJIE XIANG, ZHICHEN ZENG, MENGJIA DAI, ZHIRU ZHANG