Wavelet Convolutions for Large Receptive Fields

Wavelet Convolutions for Large Receptive Fields

15 Jul 2024 | Shahaf E. Finder, Roy Amoyal, Eran Treister, and Oren Freifeld
This paper introduces WTConv, a wavelet-based convolutional layer that enables large receptive fields without increasing the number of trainable parameters significantly. Unlike traditional methods that increase kernel sizes, which lead to over-parameterization and saturation, WTConv leverages the Wavelet Transform (WT) to decompose the input into different frequency bands and perform small-kernel convolutions on each band. This approach allows the layer to capture low-frequency information more effectively, improving shape bias and robustness to image corruption. The proposed WTConv can be used as a drop-in replacement for depth-wise convolutions in existing architectures, such as ConvNeXt and MobileNetV2, and has been shown to improve performance in image classification, semantic segmentation, and object detection tasks. The layer's parameters grow logarithmically with the receptive field size, making it more efficient than other methods that scale quadratically. The paper also demonstrates that WTConv enhances the network's scalability, robustness, and ability to respond to shapes over textures. The results show that WTConv achieves state-of-the-art performance on ImageNet-1K, ADE20K, and COCO benchmarks, with improvements in classification accuracy, mIoU, and AP metrics. Additionally, the layer is robust to various types of image corruption and performs better in tasks where shape information is important. The method is implemented in Python and is available at https://github.com/BGU-CS-VIL/WTConv.This paper introduces WTConv, a wavelet-based convolutional layer that enables large receptive fields without increasing the number of trainable parameters significantly. Unlike traditional methods that increase kernel sizes, which lead to over-parameterization and saturation, WTConv leverages the Wavelet Transform (WT) to decompose the input into different frequency bands and perform small-kernel convolutions on each band. This approach allows the layer to capture low-frequency information more effectively, improving shape bias and robustness to image corruption. The proposed WTConv can be used as a drop-in replacement for depth-wise convolutions in existing architectures, such as ConvNeXt and MobileNetV2, and has been shown to improve performance in image classification, semantic segmentation, and object detection tasks. The layer's parameters grow logarithmically with the receptive field size, making it more efficient than other methods that scale quadratically. The paper also demonstrates that WTConv enhances the network's scalability, robustness, and ability to respond to shapes over textures. The results show that WTConv achieves state-of-the-art performance on ImageNet-1K, ADE20K, and COCO benchmarks, with improvements in classification accuracy, mIoU, and AP metrics. Additionally, the layer is robust to various types of image corruption and performs better in tasks where shape information is important. The method is implemented in Python and is available at https://github.com/BGU-CS-VIL/WTConv.
Reach us at info@study.space