SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks

SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks

23 May 2017 | Angshuman Parashar† Minsoo Rhu† Anurag Mukkara‡ Antonio Puglielli† Rangharajan Venkatesan† Brucek Khailany† Joel Emer†‡ Stephen W. Keckler† William J. Dally†∞
The paper introduces the Sparse CNN (SCNN) accelerator architecture, which enhances performance and energy efficiency by leveraging the sparsity of weights and activations in convolutional neural networks (CNNs). SCNN employs a novel dataflow that maintains sparse weights and activations in a compressed encoding, reducing unnecessary data transfers and storage requirements. This approach allows for efficient delivery of weights and activations to the multiplier array, where they are reused extensively. The accumulation of multiplication products is performed using a specialized accumulator array. SCNN's design includes multiple processing elements (PEs) with multiplier arrays, each handling disjoint 3D tiles of input activations. The PE architecture includes a weight buffer, input/output activation RAMs, a multiplier array, a scatter crossbar, and accumulator buffers. SCNN achieves significant performance and energy improvements over dense CNN accelerators, with a factor of 2.7× speedup and 2.3× energy reduction on contemporary networks like AlexNet, GoogLeNet, and VGGNet. The paper also discusses the impact of network sparsity on performance and the trade-offs between PE granularity and intra-PE fragmentation.The paper introduces the Sparse CNN (SCNN) accelerator architecture, which enhances performance and energy efficiency by leveraging the sparsity of weights and activations in convolutional neural networks (CNNs). SCNN employs a novel dataflow that maintains sparse weights and activations in a compressed encoding, reducing unnecessary data transfers and storage requirements. This approach allows for efficient delivery of weights and activations to the multiplier array, where they are reused extensively. The accumulation of multiplication products is performed using a specialized accumulator array. SCNN's design includes multiple processing elements (PEs) with multiplier arrays, each handling disjoint 3D tiles of input activations. The PE architecture includes a weight buffer, input/output activation RAMs, a multiplier array, a scatter crossbar, and accumulator buffers. SCNN achieves significant performance and energy improvements over dense CNN accelerators, with a factor of 2.7× speedup and 2.3× energy reduction on contemporary networks like AlexNet, GoogLeNet, and VGGNet. The paper also discusses the impact of network sparsity on performance and the trade-offs between PE granularity and intra-PE fragmentation.
Reach us at info@study.space