28 Mar 2024 | Dong-Hwan Jang*, Sangdo Yun, and Dongyoon Han
This paper introduces an efficient fine-tuning method for large pre-trained models, offering strong in-distribution (ID) and out-of-distribution (OOD) performance. The proposed method, called Model Stock, employs significantly fewer models to achieve final weights yet yields superior accuracy. The method leverages the geometric properties of weight space and the pre-trained model's anchoring effect to approximate a center-close weight using only two fine-tuned models. Model Stock outperforms state-of-the-art methods like Model Soup, achieving remarkable performance on both ID and OOD tasks with minimal computational costs. The method is demonstrated using pre-trained CLIP architectures, achieving 87.8% ImageNet top-1 accuracy and averaged 74.9% in five distribution shift benchmarks. The study shows that fine-tuned weights lie on a thin shell in weight space, with closer proximity to the center of this shell improving performance. Model Stock's layer-wise weight averaging technique is shown to be more efficient than existing methods, with results indicating that the optimal interpolation ratio depends on the angle between fine-tuned models. The method is validated through extensive experiments, demonstrating its effectiveness across various models and benchmarks. The study also highlights the importance of reducing variance in weight distributions for improved performance in out-of-distribution scenarios. The findings suggest that Model Stock is a practical and efficient approach for fine-tuning pre-trained models, with potential applications in both academia and industry.This paper introduces an efficient fine-tuning method for large pre-trained models, offering strong in-distribution (ID) and out-of-distribution (OOD) performance. The proposed method, called Model Stock, employs significantly fewer models to achieve final weights yet yields superior accuracy. The method leverages the geometric properties of weight space and the pre-trained model's anchoring effect to approximate a center-close weight using only two fine-tuned models. Model Stock outperforms state-of-the-art methods like Model Soup, achieving remarkable performance on both ID and OOD tasks with minimal computational costs. The method is demonstrated using pre-trained CLIP architectures, achieving 87.8% ImageNet top-1 accuracy and averaged 74.9% in five distribution shift benchmarks. The study shows that fine-tuned weights lie on a thin shell in weight space, with closer proximity to the center of this shell improving performance. Model Stock's layer-wise weight averaging technique is shown to be more efficient than existing methods, with results indicating that the optimal interpolation ratio depends on the angle between fine-tuned models. The method is validated through extensive experiments, demonstrating its effectiveness across various models and benchmarks. The study also highlights the importance of reducing variance in weight distributions for improved performance in out-of-distribution scenarios. The findings suggest that Model Stock is a practical and efficient approach for fine-tuning pre-trained models, with potential applications in both academia and industry.