25 Jan 2017 | Wenjie Luo*, Yujia Li*, Raquel Urtasun, Richard Zemel
This paper investigates the effective receptive field (ERF) in deep convolutional neural networks (CNNs). The ERF is defined as the region within a receptive field where input pixels have a non-negligible impact on the output. The study shows that the impact distribution within a receptive field follows a Gaussian distribution, and the ERF only occupies a fraction of the theoretical receptive field. This is due to the rapid decay of Gaussian distributions from the center.
The ERF is analyzed in various CNN architectures, and the effects of nonlinear activations, dropout, subsampling, dilated convolutions, and skip connections on the ERF are discussed. The results show that the ERF size grows with the square root of the number of layers, while the relative ERF size shrinks with the inverse square root of the number of layers. Subsampling and dilated convolutions are effective in increasing the ERF size, while skip connections reduce it.
Empirical experiments confirm the theoretical findings, showing that the ERF is Gaussian-like and that the ERF size increases during training. The study also suggests ways to increase the ERF, such as modifying the initial weights or changing the CNN architecture. The findings have implications for understanding how CNNs process visual information and for improving their performance in tasks requiring large receptive fields.This paper investigates the effective receptive field (ERF) in deep convolutional neural networks (CNNs). The ERF is defined as the region within a receptive field where input pixels have a non-negligible impact on the output. The study shows that the impact distribution within a receptive field follows a Gaussian distribution, and the ERF only occupies a fraction of the theoretical receptive field. This is due to the rapid decay of Gaussian distributions from the center.
The ERF is analyzed in various CNN architectures, and the effects of nonlinear activations, dropout, subsampling, dilated convolutions, and skip connections on the ERF are discussed. The results show that the ERF size grows with the square root of the number of layers, while the relative ERF size shrinks with the inverse square root of the number of layers. Subsampling and dilated convolutions are effective in increasing the ERF size, while skip connections reduce it.
Empirical experiments confirm the theoretical findings, showing that the ERF is Gaussian-like and that the ERF size increases during training. The study also suggests ways to increase the ERF, such as modifying the initial weights or changing the CNN architecture. The findings have implications for understanding how CNNs process visual information and for improving their performance in tasks requiring large receptive fields.