[slides] PeLK%3A Parameter-Efficient Large Kernel ConvNets with Peripheral Convolution

The paper introduces a novel parameter-efficient large kernel convolutional network (PeLK) that addresses the challenges of scaling up kernel sizes in convolutional neural networks (CNNs). Traditional CNNs, such as Swin, ConvNeXt, RepLKNet, and SLaK, have limited kernel sizes due to the square complexity of convolution, which leads to a significant increase in parameters and optimization issues. Inspired by human peripheral vision, the authors propose a peripheral convolution mechanism that reduces parameter count by over 90% while maintaining dense grid convolution. This mechanism uses parameter sharing in both central and peripheral regions, with an exponentially increasing sharing granularity, effectively reducing the complexity from \(O(K^2)\) to \(O(\log K)\). Building on this, PeLK scales up kernel sizes to unprecedented levels, such as \(101 \times 101\), and outperforms modern vision transformers and CNN architectures on various tasks including ImageNet classification, semantic segmentation on ADE20K, and object detection on MS COCO. The paper also includes extensive experiments and ablations to validate the effectiveness of the proposed peripheral convolution and its impact on model performance and efficiency.The paper introduces a novel parameter-efficient large kernel convolutional network (PeLK) that addresses the challenges of scaling up kernel sizes in convolutional neural networks (CNNs). Traditional CNNs, such as Swin, ConvNeXt, RepLKNet, and SLaK, have limited kernel sizes due to the square complexity of convolution, which leads to a significant increase in parameters and optimization issues. Inspired by human peripheral vision, the authors propose a peripheral convolution mechanism that reduces parameter count by over 90% while maintaining dense grid convolution. This mechanism uses parameter sharing in both central and peripheral regions, with an exponentially increasing sharing granularity, effectively reducing the complexity from \(O(K^2)\) to \(O(\log K)\). Building on this, PeLK scales up kernel sizes to unprecedented levels, such as \(101 \times 101\), and outperforms modern vision transformers and CNN architectures on various tasks including ImageNet classification, semantic segmentation on ADE20K, and object detection on MS COCO. The paper also includes extensive experiments and ablations to validate the effectiveness of the proposed peripheral convolution and its impact on model performance and efficiency.

PeLK: Parameter-efficient Large Kernel ConvNets with Peripheral Convolution

16 Mar 2024 | Honghao Chen, Xiangxiang Chu, Yongjian Ren, Xin Zhao, Kaiqi Huang