1 Dec 2016 | Yaman Umuroglu*, Nicholas J. Fraser*, Giulio Gambardella*, Michaela Blott*, Philip Leong*, Magnus Jahre† and Kees Vissers*
FINN is a framework for fast and scalable binarized neural network (BNN) inference on FPGAs. It enables efficient mapping of BNNs to hardware, supporting fully connected, convolutional, and pooling layers with customizable compute resources. On a ZC706 FPGA with less than 25 W power, FINN achieves 12.3 million image classifications per second with 0.31 µs latency and 95.8% accuracy on MNIST, and 21,906 classifications per second with 283 µs latency on CIFAR-10 and SVHN with 80.1% and 94.9% accuracy, respectively. These results represent the fastest reported classification rates on these benchmarks.
The paper introduces FINN, a framework for building scalable and fast BNN inference accelerators on FPGAs. It presents a heterogeneous streaming architecture with novel optimizations for efficient BNN mapping. The framework supports customizable throughput and demonstrates performance on MNIST, CIFAR-10, and SVHN datasets, achieving classification rates 48×, 2.2×, and 8× faster than previous results. The framework's contributions include a roofline model for BNN performance on FPGAs, novel optimizations for BNN mapping, a BNN architecture and accelerator construction tool, and prototypes on off-the-shelf FPGAs.
The paper discusses BNNs, their hardware implementations, and performance on FPGAs. It presents a roofline model for estimating BNN performance, showing that BNNs can achieve up to 66 TOPS on a Zynq UltraScale+ ZU19EG FPGA, significantly higher than 8-bit fixed-point implementations. It also explores the trade-offs between network size, precision, and accuracy, showing that BNNs can achieve comparable accuracy with fewer parameters and operations.
The paper describes the architecture of FINN, including a heterogeneous streaming design with separate compute engines for each layer. It presents BNN-specific operator optimizations, such as popcount for accumulation, batchnorm-activation as threshold, and boolean OR for max-pooling. The framework also includes a matrix-vector-threshold unit (MVTU) for efficient BNN computation and a sliding window unit for convolution.
The paper evaluates the performance of FINN on MNIST, CIFAR-10, and SVHN datasets, showing that it achieves high throughput, low latency, and low power consumption. It compares the results with prior work, demonstrating that FINN outperforms existing approaches in terms of FPS, power efficiency, and resource utilization. The paper concludes that BNNs are well-suited for FPGA implementations due to their high computational performance, low power consumption, and flexibility. Future work will focus on supporting non-binary low-precision networks, implementing larger networks like AlexNet, and exploring a more thorough design space.FINN is a framework for fast and scalable binarized neural network (BNN) inference on FPGAs. It enables efficient mapping of BNNs to hardware, supporting fully connected, convolutional, and pooling layers with customizable compute resources. On a ZC706 FPGA with less than 25 W power, FINN achieves 12.3 million image classifications per second with 0.31 µs latency and 95.8% accuracy on MNIST, and 21,906 classifications per second with 283 µs latency on CIFAR-10 and SVHN with 80.1% and 94.9% accuracy, respectively. These results represent the fastest reported classification rates on these benchmarks.
The paper introduces FINN, a framework for building scalable and fast BNN inference accelerators on FPGAs. It presents a heterogeneous streaming architecture with novel optimizations for efficient BNN mapping. The framework supports customizable throughput and demonstrates performance on MNIST, CIFAR-10, and SVHN datasets, achieving classification rates 48×, 2.2×, and 8× faster than previous results. The framework's contributions include a roofline model for BNN performance on FPGAs, novel optimizations for BNN mapping, a BNN architecture and accelerator construction tool, and prototypes on off-the-shelf FPGAs.
The paper discusses BNNs, their hardware implementations, and performance on FPGAs. It presents a roofline model for estimating BNN performance, showing that BNNs can achieve up to 66 TOPS on a Zynq UltraScale+ ZU19EG FPGA, significantly higher than 8-bit fixed-point implementations. It also explores the trade-offs between network size, precision, and accuracy, showing that BNNs can achieve comparable accuracy with fewer parameters and operations.
The paper describes the architecture of FINN, including a heterogeneous streaming design with separate compute engines for each layer. It presents BNN-specific operator optimizations, such as popcount for accumulation, batchnorm-activation as threshold, and boolean OR for max-pooling. The framework also includes a matrix-vector-threshold unit (MVTU) for efficient BNN computation and a sliding window unit for convolution.
The paper evaluates the performance of FINN on MNIST, CIFAR-10, and SVHN datasets, showing that it achieves high throughput, low latency, and low power consumption. It compares the results with prior work, demonstrating that FINN outperforms existing approaches in terms of FPS, power efficiency, and resource utilization. The paper concludes that BNNs are well-suited for FPGA implementations due to their high computational performance, low power consumption, and flexibility. Future work will focus on supporting non-binary low-precision networks, implementing larger networks like AlexNet, and exploring a more thorough design space.