[slides and audio] An Intriguing Failing of Convolutional Neural Networks and the CoordConv Solution

This paper highlights a significant limitation of convolutional neural networks (CNNs) in handling coordinate transformations, particularly between Cartesian space and pixel-based representations. The authors introduce CoordConv, a novel layer that addresses this issue by allowing convolutional filters to access their input coordinates through additional coordinate channels. This approach enables networks to learn either complete translation invariance or varying degrees of translation dependence, depending on the task requirements. The CoordConv layer maintains the computational efficiency and parameter efficiency of standard convolution while significantly improving performance on tasks such as coordinate classification, regression, and rendering. The authors demonstrate that CoordConv can enhance various applications, including image classification, object detection, generative modeling, and reinforcement learning. They provide empirical evidence that using CoordConv can improve performance in these domains, suggesting that the inability of CNNs to handle coordinate transformations may have subtly hindered their performance in other tasks.This paper highlights a significant limitation of convolutional neural networks (CNNs) in handling coordinate transformations, particularly between Cartesian space and pixel-based representations. The authors introduce CoordConv, a novel layer that addresses this issue by allowing convolutional filters to access their input coordinates through additional coordinate channels. This approach enables networks to learn either complete translation invariance or varying degrees of translation dependence, depending on the task requirements. The CoordConv layer maintains the computational efficiency and parameter efficiency of standard convolution while significantly improving performance on tasks such as coordinate classification, regression, and rendering. The authors demonstrate that CoordConv can enhance various applications, including image classification, object detection, generative modeling, and reinforcement learning. They provide empirical evidence that using CoordConv can improve performance in these domains, suggesting that the inability of CNNs to handle coordinate transformations may have subtly hindered their performance in other tasks.

An intriguing failing of convolutional neural networks and the CoordConv solution

3 Dec 2018 | Rosanne Liu, Joel Lehman, Piero Molino, Felipe Petroski Such, Eric Frank, Alex Sergeev, Jason Yosinski