Model Stock: All we need is just a few fine-tuned models

Model Stock: All we need is just a few fine-tuned models

28 Mar 2024 | Dong-Hwan Jang*, Sangdoo Yun, and Dongyoon Han
This paper introduces a novel fine-tuning method called *Model Stock*, which aims to achieve superior performance with fewer fine-tuned models compared to traditional methods like Model Soup. Model Stock leverages the geometric properties of fine-tuned weights, particularly their thin shell distribution around the center of the weight space, to approximate the merged weights using only two fine-tuned models. This approach significantly reduces computational costs while maintaining or improving accuracy on both in-distribution (ID) and out-of-distribution (OOD) tasks. The method is demonstrated to be effective on pre-trained CLIP architectures, achieving high accuracy on standard benchmarks such as ImageNet and various distribution shift benchmarks. The authors also provide a detailed analysis of the geometric properties of fine-tuned weights and interpret the performance gains observed in methods like WiSE-FT and Model Soup through the lens of their proximity to the center of the weight distribution.This paper introduces a novel fine-tuning method called *Model Stock*, which aims to achieve superior performance with fewer fine-tuned models compared to traditional methods like Model Soup. Model Stock leverages the geometric properties of fine-tuned weights, particularly their thin shell distribution around the center of the weight space, to approximate the merged weights using only two fine-tuned models. This approach significantly reduces computational costs while maintaining or improving accuracy on both in-distribution (ID) and out-of-distribution (OOD) tasks. The method is demonstrated to be effective on pre-trained CLIP architectures, achieving high accuracy on standard benchmarks such as ImageNet and various distribution shift benchmarks. The authors also provide a detailed analysis of the geometric properties of fine-tuned weights and interpret the performance gains observed in methods like WiSE-FT and Model Soup through the lens of their proximity to the center of the weight distribution.
Reach us at info@study.space