Capsule Networks With Residual Pose Routing

Capsule Networks With Residual Pose Routing

2025 | Yi Liu, De Cheng, Dingwen Zhang, Shoukun Xu, and Jungong Han
This paper presents a deep Capsule Network (CapsNet) called Residual CapsNet (ResCaps) with a novel residual pose routing algorithm. The proposed method improves upon traditional CapsNets by reducing routing computation complexity and avoiding gradient vanishing through a residual learning framework. The ResCaps architecture is designed with five capsule routing blocks, each containing a Primary Capsule (PriCaps) layer and residual pose routing (ResP) layers. The network is structured to enable deep learning with a ResNet-like architecture, allowing for efficient and effective image classification. The ResCaps model is evaluated on several benchmark datasets, including MNIST, AffNIST, SmallNORB, and CIFAR-10/100, demonstrating superior performance in terms of accuracy and error rates. The model achieves a test error of 0.72% on MNIST and 0.91% on SmallNORB, significantly outperforming existing capsule-based and CNN-based methods. Additionally, the model is extended to real-world applications such as 3D object reconstruction and 2D image saliency detection, showing its effectiveness in handling complex tasks. The residual pose routing algorithm is characterized by its ability to reduce network parameters and computational complexity, while maintaining high performance. It employs a sparsely-connected routing mechanism, which reduces the number of parameters compared to fully-connected routing. The algorithm also enables the model to avoid gradient vanishing, facilitating the design of deeper networks. The ResCaps model is compared with other methods, including EM-Caps and RCCaps, and is found to outperform them in terms of accuracy, error rates, and computational efficiency. The paper also includes an ablation study that evaluates the effectiveness of different components of the ResCaps model, including the number of routing layers and the impact of residual connections. The results show that the model's deep architecture provides better robustness to affine transformations and improved recognition ability for complex images. The model is implemented in PyTorch and is available for public use. The proposed method demonstrates the potential of CapsNets in deep learning applications, particularly in tasks requiring part-whole relationships and robustness to variations in input data.This paper presents a deep Capsule Network (CapsNet) called Residual CapsNet (ResCaps) with a novel residual pose routing algorithm. The proposed method improves upon traditional CapsNets by reducing routing computation complexity and avoiding gradient vanishing through a residual learning framework. The ResCaps architecture is designed with five capsule routing blocks, each containing a Primary Capsule (PriCaps) layer and residual pose routing (ResP) layers. The network is structured to enable deep learning with a ResNet-like architecture, allowing for efficient and effective image classification. The ResCaps model is evaluated on several benchmark datasets, including MNIST, AffNIST, SmallNORB, and CIFAR-10/100, demonstrating superior performance in terms of accuracy and error rates. The model achieves a test error of 0.72% on MNIST and 0.91% on SmallNORB, significantly outperforming existing capsule-based and CNN-based methods. Additionally, the model is extended to real-world applications such as 3D object reconstruction and 2D image saliency detection, showing its effectiveness in handling complex tasks. The residual pose routing algorithm is characterized by its ability to reduce network parameters and computational complexity, while maintaining high performance. It employs a sparsely-connected routing mechanism, which reduces the number of parameters compared to fully-connected routing. The algorithm also enables the model to avoid gradient vanishing, facilitating the design of deeper networks. The ResCaps model is compared with other methods, including EM-Caps and RCCaps, and is found to outperform them in terms of accuracy, error rates, and computational efficiency. The paper also includes an ablation study that evaluates the effectiveness of different components of the ResCaps model, including the number of routing layers and the impact of residual connections. The results show that the model's deep architecture provides better robustness to affine transformations and improved recognition ability for complex images. The model is implemented in PyTorch and is available for public use. The proposed method demonstrates the potential of CapsNets in deep learning applications, particularly in tasks requiring part-whole relationships and robustness to variations in input data.
Reach us at info@study.space