6 Sep 2017 | Surat Teerapittayanon, Bradley McDanel, H.T. Kung
BranchyNet is a novel deep network architecture designed to improve inference speed and reduce energy consumption. It introduces additional side branch classifiers that allow a large portion of test samples to exit the network early when they can be confidently classified at earlier stages. This approach leverages the observation that features learned at early layers often suffice for many data points, reducing the need for processing through all network layers. The architecture is trained using a joint optimization problem that minimizes the weighted sum of loss functions associated with each exit point, providing regularization and mitigating vanishing gradients. Evaluations on well-known networks (LeNet, AlexNet, ResNet) and datasets (MNIST, CIFAR10) demonstrate that BranchyNet can significantly reduce inference time while maintaining or improving accuracy. Key contributions include fast inference with early exit branches, joint optimization for regularization, and mitigation of vanishing gradients. The architecture is flexible and can be adapted to various tasks beyond classification, such as image segmentation.BranchyNet is a novel deep network architecture designed to improve inference speed and reduce energy consumption. It introduces additional side branch classifiers that allow a large portion of test samples to exit the network early when they can be confidently classified at earlier stages. This approach leverages the observation that features learned at early layers often suffice for many data points, reducing the need for processing through all network layers. The architecture is trained using a joint optimization problem that minimizes the weighted sum of loss functions associated with each exit point, providing regularization and mitigating vanishing gradients. Evaluations on well-known networks (LeNet, AlexNet, ResNet) and datasets (MNIST, CIFAR10) demonstrate that BranchyNet can significantly reduce inference time while maintaining or improving accuracy. Key contributions include fast inference with early exit branches, joint optimization for regularization, and mitigation of vanishing gradients. The architecture is flexible and can be adapted to various tasks beyond classification, such as image segmentation.