Understanding Dynamic Routing Between Capsules

The paper introduces Capsule Networks (CapsNet), a novel approach to visual recognition that uses capsules to represent entities and their properties. Each capsule's activity vector represents the probability of an entity's existence and its orientation encodes the entity's parameters. Capsules at one level make predictions for higher-level capsules through transformation matrices, and multiple predictions agree to activate a higher-level capsule. The paper demonstrates that a multi-layer capsule system, using an iterative routing-by-agreement mechanism, achieves state-of-the-art performance on the MNIST dataset, outperforming convolutional neural networks (CNNs) in recognizing highly overlapping digits. The CapsNet architecture consists of two convolutional layers and one fully connected layer, with dynamic routing between consecutive layers. The paper also discusses the use of margin loss for digit existence and reconstruction loss to encourage accurate encoding of instantiation parameters. Experiments on various datasets, including MNIST, CIFAR10, smallNORB, and SVHN, show that CapsNet performs well, with particularly strong performance in segmenting highly overlapping digits. The authors compare CapsNet to previous work and highlight its advantages in handling viewpoint variations and segmentation tasks.The paper introduces Capsule Networks (CapsNet), a novel approach to visual recognition that uses capsules to represent entities and their properties. Each capsule's activity vector represents the probability of an entity's existence and its orientation encodes the entity's parameters. Capsules at one level make predictions for higher-level capsules through transformation matrices, and multiple predictions agree to activate a higher-level capsule. The paper demonstrates that a multi-layer capsule system, using an iterative routing-by-agreement mechanism, achieves state-of-the-art performance on the MNIST dataset, outperforming convolutional neural networks (CNNs) in recognizing highly overlapping digits. The CapsNet architecture consists of two convolutional layers and one fully connected layer, with dynamic routing between consecutive layers. The paper also discusses the use of margin loss for digit existence and reconstruction loss to encourage accurate encoding of instantiation parameters. Experiments on various datasets, including MNIST, CIFAR10, smallNORB, and SVHN, show that CapsNet performs well, with particularly strong performance in segmenting highly overlapping digits. The authors compare CapsNet to previous work and highlight its advantages in handling viewpoint variations and segmentation tasks.

Dynamic Routing Between Capsules

7 Nov 2017 | Sara Sabour, Nicholas Frosst, Geoffrey E. Hinton