Non-local Neural Networks

Non-local Neural Networks

13 Apr 2018 | Xiaolong Wang1,2*, Ross Girshick2 Abhinav Gupta1 Kaiming He2
This paper introduces non-local neural networks, a new type of neural network that captures long-range dependencies through non-local operations. These operations compute the response at a position as a weighted sum of features at all positions in the input. Non-local operations can be integrated into various computer vision architectures and have shown effectiveness in video classification and static image recognition tasks. In video classification, non-local networks outperform current state-of-the-art models on the Kinetics and Charades datasets. They achieve competitive or superior results without using optical flow or other complex techniques. In static image recognition, non-local networks improve object detection, segmentation, and pose estimation on the COCO dataset. Non-local operations are efficient and can be combined with other operations like convolutions. They maintain variable input sizes and are effective in both spatial and temporal domains. The paper also shows that non-local operations are complementary to 3D convolutions and can enhance performance in video classification tasks. The non-local operation is defined as a weighted sum of features at all positions, which allows for direct capture of long-range dependencies. The paper presents several instantiations of this operation, including Gaussian, embedded Gaussian, dot product, and concatenation versions. These versions are shown to be effective in various tasks. The paper also discusses the implementation details of non-local networks, including their integration into existing architectures and the use of residual connections. Experiments on video classification and static image recognition tasks demonstrate the effectiveness of non-local networks in capturing long-range dependencies and improving performance. The results show that non-local networks are more accurate and computationally efficient than 3D convolutions in video classification tasks. They also perform well in static image recognition tasks, improving object detection, segmentation, and pose estimation. The paper concludes that non-local operations are a valuable component for future neural network architectures.This paper introduces non-local neural networks, a new type of neural network that captures long-range dependencies through non-local operations. These operations compute the response at a position as a weighted sum of features at all positions in the input. Non-local operations can be integrated into various computer vision architectures and have shown effectiveness in video classification and static image recognition tasks. In video classification, non-local networks outperform current state-of-the-art models on the Kinetics and Charades datasets. They achieve competitive or superior results without using optical flow or other complex techniques. In static image recognition, non-local networks improve object detection, segmentation, and pose estimation on the COCO dataset. Non-local operations are efficient and can be combined with other operations like convolutions. They maintain variable input sizes and are effective in both spatial and temporal domains. The paper also shows that non-local operations are complementary to 3D convolutions and can enhance performance in video classification tasks. The non-local operation is defined as a weighted sum of features at all positions, which allows for direct capture of long-range dependencies. The paper presents several instantiations of this operation, including Gaussian, embedded Gaussian, dot product, and concatenation versions. These versions are shown to be effective in various tasks. The paper also discusses the implementation details of non-local networks, including their integration into existing architectures and the use of residual connections. Experiments on video classification and static image recognition tasks demonstrate the effectiveness of non-local networks in capturing long-range dependencies and improving performance. The results show that non-local networks are more accurate and computationally efficient than 3D convolutions in video classification tasks. They also perform well in static image recognition tasks, improving object detection, segmentation, and pose estimation. The paper concludes that non-local operations are a valuable component for future neural network architectures.
Reach us at info@study.space