1 Dec 2016 | Yaman Umuroglu*, Nicholas J. Fraser*, Giulio Gambardella*, Michaela Blott*, Philip Leong*, Magnus Jahre† and Kees Vissers*
The paper introduces FINN, a framework for building fast and flexible FPGA accelerators for binarized neural networks (BNNs). BNNs, which use binary weights and activations, offer significant computational efficiency and energy savings compared to traditional floating-point neural networks. FINN employs a heterogeneous streaming architecture and a set of optimizations to efficiently map BNNs to hardware, achieving high classification rates with low latency and power consumption. The framework allows users to customize throughput requirements and supports various network topologies. Experimental results on the MNIST, CIFAR-10, and SVHN datasets demonstrate that FINN can achieve up to 12.3 million image classifications per second with 0.31 μs latency on the MNIST dataset, and 21906 image classifications per second with 283 μs latency on the CIFAR-10 and SVHN datasets, respectively. These results surpass previous best-known performance on these benchmarks. The paper also discusses the accuracy-computation trade-offs, energy efficiency, and resource efficiency of the proposed design.The paper introduces FINN, a framework for building fast and flexible FPGA accelerators for binarized neural networks (BNNs). BNNs, which use binary weights and activations, offer significant computational efficiency and energy savings compared to traditional floating-point neural networks. FINN employs a heterogeneous streaming architecture and a set of optimizations to efficiently map BNNs to hardware, achieving high classification rates with low latency and power consumption. The framework allows users to customize throughput requirements and supports various network topologies. Experimental results on the MNIST, CIFAR-10, and SVHN datasets demonstrate that FINN can achieve up to 12.3 million image classifications per second with 0.31 μs latency on the MNIST dataset, and 21906 image classifications per second with 283 μs latency on the CIFAR-10 and SVHN datasets, respectively. These results surpass previous best-known performance on these benchmarks. The paper also discusses the accuracy-computation trade-offs, energy efficiency, and resource efficiency of the proposed design.