SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks

SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks

23 May 2017 | Angshuman Parashar† Minsoo Rhu† Anurag Mukkara‡ Antonio Puglielli† Rangharajan Venkatesan† Brucek Khailany† Joel Emer†‡ Stephen W. Keckler† William J. Dally†∞
SCNN is an accelerator for compressed-sparse convolutional neural networks (CNNs) that improves performance and energy efficiency by exploiting sparse weights and activations. The architecture uses a novel dataflow that maintains sparse weights and activations in a compressed encoding, reducing data transfers and storage requirements. It efficiently delivers these weights and activations to the multiplier array, where they are reused extensively. The accumulation of multiplication products is performed in a novel accumulator array. SCNN achieves a 2.7× speedup and 2.3× energy reduction over a comparable dense CNN accelerator on contemporary neural networks. The design includes a processing element (PE) with a multiplier array that accepts vectors of weights and activations. The dataflow ensures that only non-zero weights and activations are fetched from input storage arrays. SCNN also employs a scatter crossbar to route partial sums to accumulator banks. The architecture is designed to reduce energy-hungry data transmission by keeping input and output activations local to each PE. The design includes a 1,024 multiplier array and 1MB of activation RAM. SCNN's performance and energy efficiency are evaluated using a cycle-level simulator and an analytical model. The results show that SCNN outperforms dense CNN accelerators in terms of performance and energy efficiency. The architecture is scalable and can be configured for different dimensions. The design includes a 8×8 array of PEs, each with a 4×4 multiplier array and 32 accumulator banks. The SCNN architecture is compared to other CNN accelerators, including DCNN and DCNN-opt, in terms of performance and energy efficiency. The results show that SCNN achieves significant performance improvements and energy savings over dense CNN accelerators. The architecture is evaluated on various networks, including AlexNet, GoogLeNet, and VGGNet. The results show that SCNN achieves an average 2.37×, 2.19×, and 3.52× performance improvement over dense CNN accelerators for AlexNet, GoogLeNet, and VGGNet, respectively. The energy efficiency of SCNN is also evaluated, showing a 2.3× improvement over dense CNN accelerators. The architecture is compared to other CNN accelerators, including Eyeriss and Cnvlutin, in terms of performance and energy efficiency. The results show that SCNN outperforms these accelerators in terms of performance and energy efficiency. The architecture is also evaluated for larger networks, showing that it can handle large layers efficiently by either provisioning large on-chip memory or using a tiling approach. The results show that the tiling approach is only required for a small number of layers, with an average energy penalty of 18%. The architecture is also evaluated for PE granularity, showing that the performance of SCNN is affected by both cross-PE global barriers and intra-PE multiplier array fragmentation. The results show that addressing intra-PE fragmentation is more critical thanSCNN is an accelerator for compressed-sparse convolutional neural networks (CNNs) that improves performance and energy efficiency by exploiting sparse weights and activations. The architecture uses a novel dataflow that maintains sparse weights and activations in a compressed encoding, reducing data transfers and storage requirements. It efficiently delivers these weights and activations to the multiplier array, where they are reused extensively. The accumulation of multiplication products is performed in a novel accumulator array. SCNN achieves a 2.7× speedup and 2.3× energy reduction over a comparable dense CNN accelerator on contemporary neural networks. The design includes a processing element (PE) with a multiplier array that accepts vectors of weights and activations. The dataflow ensures that only non-zero weights and activations are fetched from input storage arrays. SCNN also employs a scatter crossbar to route partial sums to accumulator banks. The architecture is designed to reduce energy-hungry data transmission by keeping input and output activations local to each PE. The design includes a 1,024 multiplier array and 1MB of activation RAM. SCNN's performance and energy efficiency are evaluated using a cycle-level simulator and an analytical model. The results show that SCNN outperforms dense CNN accelerators in terms of performance and energy efficiency. The architecture is scalable and can be configured for different dimensions. The design includes a 8×8 array of PEs, each with a 4×4 multiplier array and 32 accumulator banks. The SCNN architecture is compared to other CNN accelerators, including DCNN and DCNN-opt, in terms of performance and energy efficiency. The results show that SCNN achieves significant performance improvements and energy savings over dense CNN accelerators. The architecture is evaluated on various networks, including AlexNet, GoogLeNet, and VGGNet. The results show that SCNN achieves an average 2.37×, 2.19×, and 3.52× performance improvement over dense CNN accelerators for AlexNet, GoogLeNet, and VGGNet, respectively. The energy efficiency of SCNN is also evaluated, showing a 2.3× improvement over dense CNN accelerators. The architecture is compared to other CNN accelerators, including Eyeriss and Cnvlutin, in terms of performance and energy efficiency. The results show that SCNN outperforms these accelerators in terms of performance and energy efficiency. The architecture is also evaluated for larger networks, showing that it can handle large layers efficiently by either provisioning large on-chip memory or using a tiling approach. The results show that the tiling approach is only required for a small number of layers, with an average energy penalty of 18%. The architecture is also evaluated for PE granularity, showing that the performance of SCNN is affected by both cross-PE global barriers and intra-PE multiplier array fragmentation. The results show that addressing intra-PE fragmentation is more critical than
Reach us at info@study.space
Understanding SCNN%3A An accelerator for compressed-sparse convolutional neural networks