DINO AS A VON MISES-FISHER MIXTURE MODEL

DINO AS A VON MISES-FISHER MIXTURE MODEL

May 17, 2024 | Hariprasath Govindarajan, Per Sidén, Jacob Roll, Fredrik Lindsten
This paper presents a novel interpretation of the DINO self-supervised learning method as a mixture model of von Mises-Fisher (vMF) distributions. DINO, a self-distillation method, uses a cross-entropy loss between K-dimensional probability vectors derived from the dot product between representations and learned prototypes. The paper shows that DINO and its derivatives, such as iBOT, can be interpreted as a mixture model of vMF components. This interpretation allows for a more flexible and stable training process, particularly for larger models like ViT-Base. The authors propose DINO-vMF, which incorporates appropriate normalization constants when computing cluster assignment probabilities. This modification improves the performance of DINO on various downstream tasks, including image classification and few-shot learning. The paper also shows that the larger ViT-Base models achieve significantly improved few-shot classification performance with the proposed pre-training. The DINO-vMF pre-trained model consistently performs better than DINO on a range of downstream tasks. The study also demonstrates that the modification is beneficial for other methods derived from DINO, such as iBOT. The paper provides a detailed analysis of the learned vMF mixture model, showing how the model can be interpreted in terms of cluster assignments and the impact of different parameters on performance. The results show that the proposed modifications lead to better image representations and improved performance on various tasks. The paper concludes that the DINO-vMF method provides a more flexible and stable training process, particularly for larger models, and that the modification is beneficial for a range of downstream tasks.This paper presents a novel interpretation of the DINO self-supervised learning method as a mixture model of von Mises-Fisher (vMF) distributions. DINO, a self-distillation method, uses a cross-entropy loss between K-dimensional probability vectors derived from the dot product between representations and learned prototypes. The paper shows that DINO and its derivatives, such as iBOT, can be interpreted as a mixture model of vMF components. This interpretation allows for a more flexible and stable training process, particularly for larger models like ViT-Base. The authors propose DINO-vMF, which incorporates appropriate normalization constants when computing cluster assignment probabilities. This modification improves the performance of DINO on various downstream tasks, including image classification and few-shot learning. The paper also shows that the larger ViT-Base models achieve significantly improved few-shot classification performance with the proposed pre-training. The DINO-vMF pre-trained model consistently performs better than DINO on a range of downstream tasks. The study also demonstrates that the modification is beneficial for other methods derived from DINO, such as iBOT. The paper provides a detailed analysis of the learned vMF mixture model, showing how the model can be interpreted in terms of cluster assignments and the impact of different parameters on performance. The results show that the proposed modifications lead to better image representations and improved performance on various tasks. The paper concludes that the DINO-vMF method provides a more flexible and stable training process, particularly for larger models, and that the modification is beneficial for a range of downstream tasks.
Reach us at info@study.space