DINO AS A von MISES-FISHER MIXTURE MODEL

DINO AS A von MISES-FISHER MIXTURE MODEL

17 May 2024 | Hariprasath Govindarajan, Per Sidén, Jacob Roll, Fredrik Lindsten
The paper reinterprets the DINO (Deep Invariant Network) self-supervised learning (SSL) method as a von Mises-Fisher (vMF) mixture model. DINO, which uses a cross-entropy loss between $K$-dimensional probability vectors derived from the dot product of representations and learned prototypes, is shown to assume equal precision for all components when the prototypes are $L^2$-normalized. The authors propose DINO-vMF, which adds appropriate normalization constants to the cluster assignment probabilities, allowing for more flexibility in the latent space while maintaining stable training. This modification is particularly beneficial for larger ViT models with unnormalized prototypes. Experiments demonstrate that DINO-vMF consistently outperforms DINO on various downstream tasks, including image classification and few-shot learning, by improving the quality of learned representations. The paper also discusses the impact of vMF normalization and probability centering on performance and provides insights into the learned vMF mixture model, including prototype utilization and precision interpretation.The paper reinterprets the DINO (Deep Invariant Network) self-supervised learning (SSL) method as a von Mises-Fisher (vMF) mixture model. DINO, which uses a cross-entropy loss between $K$-dimensional probability vectors derived from the dot product of representations and learned prototypes, is shown to assume equal precision for all components when the prototypes are $L^2$-normalized. The authors propose DINO-vMF, which adds appropriate normalization constants to the cluster assignment probabilities, allowing for more flexibility in the latent space while maintaining stable training. This modification is particularly beneficial for larger ViT models with unnormalized prototypes. Experiments demonstrate that DINO-vMF consistently outperforms DINO on various downstream tasks, including image classification and few-shot learning, by improving the quality of learned representations. The paper also discusses the impact of vMF normalization and probability centering on performance and provides insights into the learned vMF mixture model, including prototype utilization and precision interpretation.
Reach us at info@study.space
Understanding DINO as a von Mises-Fisher mixture model