7 Nov 2017 | Sara Sabour, Nicholas Frosst, Geoffrey E. Hinton
Dynamic routing between capsules is a method for hierarchical processing of visual information. Capsules are groups of neurons whose activity vector represents the instantiation parameters of an entity, such as an object or part. The length of the activity vector represents the probability that the entity exists, while the orientation represents the instantiation parameters. Active capsules at one level make predictions for higher-level capsules, and when multiple predictions agree, a higher-level capsule becomes active. This dynamic routing mechanism, based on agreement between predictions, allows for effective segmentation of highly overlapping objects.
The paper introduces a multi-layer capsule system that achieves state-of-the-art performance on MNIST and is better than convolutional networks at recognizing overlapping digits. The system uses an iterative routing-by-agreement mechanism, where lower-level capsules prefer to send their output to higher-level capsules whose activity vectors have a large scalar product with the prediction. This mechanism is more effective than max-pooling in allowing the model to recognize overlapping objects.
Capsules use vector outputs to represent entity properties, allowing for dynamic routing to ensure the output is sent to the appropriate parent. The output of a capsule is a vector, which is then processed through a routing algorithm to determine which parent capsule it should be sent to. The routing algorithm iteratively adjusts coupling coefficients based on the agreement between the current output and the prediction.
The paper also discusses the use of a margin loss for digit existence, where the length of the instantiation vector represents the probability that a digit exists. The CapsNet architecture is described, which includes convolutional layers and a fully connected layer. The model is tested on MNIST and other datasets, achieving high accuracy and robustness to affine transformations.
The paper also discusses the use of reconstruction as a regularization method, where the model is trained to reconstruct the input image from the digit capsule output. This helps the model encode the instantiation parameters of the input digit and improves performance.
The paper concludes that capsules provide a more efficient and effective way to represent and process visual information, particularly for tasks involving overlapping objects. The dynamic routing mechanism allows the model to recognize multiple objects even when they overlap, and the capsule architecture is more robust to affine transformations than traditional convolutional networks. The results show that capsules can achieve state-of-the-art performance on various datasets, including MNIST and CIFAR10.Dynamic routing between capsules is a method for hierarchical processing of visual information. Capsules are groups of neurons whose activity vector represents the instantiation parameters of an entity, such as an object or part. The length of the activity vector represents the probability that the entity exists, while the orientation represents the instantiation parameters. Active capsules at one level make predictions for higher-level capsules, and when multiple predictions agree, a higher-level capsule becomes active. This dynamic routing mechanism, based on agreement between predictions, allows for effective segmentation of highly overlapping objects.
The paper introduces a multi-layer capsule system that achieves state-of-the-art performance on MNIST and is better than convolutional networks at recognizing overlapping digits. The system uses an iterative routing-by-agreement mechanism, where lower-level capsules prefer to send their output to higher-level capsules whose activity vectors have a large scalar product with the prediction. This mechanism is more effective than max-pooling in allowing the model to recognize overlapping objects.
Capsules use vector outputs to represent entity properties, allowing for dynamic routing to ensure the output is sent to the appropriate parent. The output of a capsule is a vector, which is then processed through a routing algorithm to determine which parent capsule it should be sent to. The routing algorithm iteratively adjusts coupling coefficients based on the agreement between the current output and the prediction.
The paper also discusses the use of a margin loss for digit existence, where the length of the instantiation vector represents the probability that a digit exists. The CapsNet architecture is described, which includes convolutional layers and a fully connected layer. The model is tested on MNIST and other datasets, achieving high accuracy and robustness to affine transformations.
The paper also discusses the use of reconstruction as a regularization method, where the model is trained to reconstruct the input image from the digit capsule output. This helps the model encode the instantiation parameters of the input digit and improves performance.
The paper concludes that capsules provide a more efficient and effective way to represent and process visual information, particularly for tasks involving overlapping objects. The dynamic routing mechanism allows the model to recognize multiple objects even when they overlap, and the capsule architecture is more robust to affine transformations than traditional convolutional networks. The results show that capsules can achieve state-of-the-art performance on various datasets, including MNIST and CIFAR10.