An intriguing failing of convolutional neural networks and the CoordConv solution

An intriguing failing of convolutional neural networks and the CoordConv solution

3 Dec 2018 | Rosanne Liu, Joel Lehman, Piero Molino, Felipe Petroski Such, Eric Frank, Alex Sergeev, Jason Yosinski
Convolutional neural networks (CNNs) have been widely used for tasks involving spatial data, but this paper reveals a significant limitation: CNNs struggle with coordinate transformations between Cartesian and pixel spaces. The authors introduce CoordConv, a modification that allows CNNs to access their own input coordinates, enabling them to learn translation invariance or dependence as needed. CoordConv outperforms standard CNNs in tasks like coordinate classification, regression, and rendering, achieving perfect accuracy with fewer parameters and faster training. The paper demonstrates that replacing CNNs with CoordConv improves performance in diverse tasks, including image classification, object detection, generative modeling, and reinforcement learning. CoordConv is shown to reduce mode collapse in GANs, improve detection accuracy in Faster R-CNN, and enhance performance in Atari games. The study highlights the need for further investigation into how CNN's limitations in coordinate transformations affect other tasks. The authors provide code for CoordConv at https://github.com/uber-research/coordconv.Convolutional neural networks (CNNs) have been widely used for tasks involving spatial data, but this paper reveals a significant limitation: CNNs struggle with coordinate transformations between Cartesian and pixel spaces. The authors introduce CoordConv, a modification that allows CNNs to access their own input coordinates, enabling them to learn translation invariance or dependence as needed. CoordConv outperforms standard CNNs in tasks like coordinate classification, regression, and rendering, achieving perfect accuracy with fewer parameters and faster training. The paper demonstrates that replacing CNNs with CoordConv improves performance in diverse tasks, including image classification, object detection, generative modeling, and reinforcement learning. CoordConv is shown to reduce mode collapse in GANs, improve detection accuracy in Faster R-CNN, and enhance performance in Atari games. The study highlights the need for further investigation into how CNN's limitations in coordinate transformations affect other tasks. The authors provide code for CoordConv at https://github.com/uber-research/coordconv.
Reach us at info@study.space