Understanding EIE%3A Efficient Inference Engine on Compressed Deep Neural Network

The paper introduces an energy-efficient inference engine (EIE) designed to process compressed deep neural networks (DNNs) on embedded systems. EIE leverages the sparsity of both weights and activations, weight sharing, and quantization to significantly reduce energy consumption. By fitting large DNNs into on-chip SRAM, EIE achieves a 120× energy saving compared to accessing from DRAM. The engine also exploits dynamic activation sparsity, saving an additional 3× energy, and uses weight sharing, saving 8× energy. EIE is evaluated on nine DNN benchmarks, showing 189× and 13× speedups over CPU and GPU implementations, respectively, while consuming 24,000× and 3,400× less energy. The architecture is scalable and can process FC layers of AlexNet at 1.88×10^10 frames/sec with only 600mW power dissipation. Compared to DaDianNao, EIE offers 2.9×, 19×, and 3× better throughput, energy efficiency, and area efficiency.The paper introduces an energy-efficient inference engine (EIE) designed to process compressed deep neural networks (DNNs) on embedded systems. EIE leverages the sparsity of both weights and activations, weight sharing, and quantization to significantly reduce energy consumption. By fitting large DNNs into on-chip SRAM, EIE achieves a 120× energy saving compared to accessing from DRAM. The engine also exploits dynamic activation sparsity, saving an additional 3× energy, and uses weight sharing, saving 8× energy. EIE is evaluated on nine DNN benchmarks, showing 189× and 13× speedups over CPU and GPU implementations, respectively, while consuming 24,000× and 3,400× less energy. The architecture is scalable and can process FC layers of AlexNet at 1.88×10^10 frames/sec with only 600mW power dissipation. Compared to DaDianNao, EIE offers 2.9×, 19×, and 3× better throughput, energy efficiency, and area efficiency.

EIE: Efficient Inference Engine on Compressed Deep Neural Network

3 May 2016 | Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A. Horowitz, William J. Dally*†

EIE: Efficient Inference Engine on Compressed Deep Neural Network

3 May 2016 | Song Han*, Xingyu Liu*, Huizi Mao*, Jing Pu*, Ardavan Pedram*, Mark A. Horowitz*, William J. Dally*†

3 May 2016 | Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A. Horowitz, William J. Dally*†