[slides] Top-Down Neural Attention by Excitation Backprop

The paper "Top-down Neural Attention by Excitation Backprop" introduces a novel method for modeling top-down attention in Convolutional Neural Networks (CNNs) to generate task-specific attention maps. The authors propose a probabilistic Winner-Take-All (WTA) process, called Excitation Backprop, which efficiently computes the marginal winning probability (MWP) of each neuron by integrating both top-down and bottom-up information. This method generates interpretable attention maps at intermediate convolutional layers, avoiding the need for complete backward sweeps. The paper also introduces the concept of contrastive top-down attention, which enhances the discriminativeness of attention maps by capturing the differential effect between pairs of top-down signals. Experiments on datasets such as MS COCO, PASCAL VOC07, and ImageNet demonstrate the accuracy and generalizability of the method. Additionally, the method is applied to text-to-region association tasks, achieving competitive performance on the Flickr30k Entities dataset without using a language model or localization supervision. The main contributions of the paper include: 1. A top-down attention model for CNNs based on a probabilistic WTA process using Excitation Backprop. 2. A contrastive top-down attention formulation to improve the discriminativeness of attention maps. 3. Empirical exploration of weakly supervised text-to-region association using the top-down neural attention model. The paper evaluates the method's performance in various tasks, including object localization and text-to-region association, showing superior results compared to existing methods.The paper "Top-down Neural Attention by Excitation Backprop" introduces a novel method for modeling top-down attention in Convolutional Neural Networks (CNNs) to generate task-specific attention maps. The authors propose a probabilistic Winner-Take-All (WTA) process, called Excitation Backprop, which efficiently computes the marginal winning probability (MWP) of each neuron by integrating both top-down and bottom-up information. This method generates interpretable attention maps at intermediate convolutional layers, avoiding the need for complete backward sweeps. The paper also introduces the concept of contrastive top-down attention, which enhances the discriminativeness of attention maps by capturing the differential effect between pairs of top-down signals. Experiments on datasets such as MS COCO, PASCAL VOC07, and ImageNet demonstrate the accuracy and generalizability of the method. Additionally, the method is applied to text-to-region association tasks, achieving competitive performance on the Flickr30k Entities dataset without using a language model or localization supervision. The main contributions of the paper include: 1. A top-down attention model for CNNs based on a probabilistic WTA process using Excitation Backprop. 2. A contrastive top-down attention formulation to improve the discriminativeness of attention maps. 3. Empirical exploration of weakly supervised text-to-region association using the top-down neural attention model. The paper evaluates the method's performance in various tasks, including object localization and text-to-region association, showing superior results compared to existing methods.

Top-down Neural Attention by Excitation Backprop

1 Aug 2016 | Jianming Zhang, Zhe Lin, Jonathan Brandt, Xiaohui Shen, Stan Sclaroff