2 Apr 2024 | Yunshi Huang, Fereshteh Shakeri, Jose Dolz, Malik Boudiaf, Houda Bahig, Ismail Ben Ayed
LP++ is a strong linear probe for few-shot CLIP adaptation. The paper proposes a generalization of the standard Linear Probe (LP) baseline, where the linear classifier weights are learnable functions of the text embedding, with class-wise multipliers blending image and text knowledge. The authors propose a computationally efficient block coordinate Majorize-Minimize (MM) descent algorithm, which uses implicit step sizes, unlike standard gradient descent practices that require intensive validation set searches for learning rates. By examining the mathematical properties of the loss function, the authors build majorizing functions that yield data-driven learning rates and approximate the loss's minima, providing data-informed initialization of the variables. The image-language objective function, along with these optimization insights, yields highly competitive few-shot CLIP performances. LP++ operates in black-box, relaxes intensive validation searches for hyper-parameters, and runs orders-of-magnitudes faster than state-of-the-art few-shot CLIP adaptation methods. The code is available at: https://github.com/FereshteShakeri/FewShot-CLIP-Strong-Baseline.git. The experiments show that LP++ outperforms existing methods in few-shot CLIP adaptation, with significant performance gains, particularly in low-labeled data regimes. The method is efficient, with a computational overhead that is several orders of magnitude smaller than state-of-the-art methods. LP++ is a strong baseline for few-shot CLIP adaptation, and its results suggest that the potential of LP has been severely underestimated in the existing literature.LP++ is a strong linear probe for few-shot CLIP adaptation. The paper proposes a generalization of the standard Linear Probe (LP) baseline, where the linear classifier weights are learnable functions of the text embedding, with class-wise multipliers blending image and text knowledge. The authors propose a computationally efficient block coordinate Majorize-Minimize (MM) descent algorithm, which uses implicit step sizes, unlike standard gradient descent practices that require intensive validation set searches for learning rates. By examining the mathematical properties of the loss function, the authors build majorizing functions that yield data-driven learning rates and approximate the loss's minima, providing data-informed initialization of the variables. The image-language objective function, along with these optimization insights, yields highly competitive few-shot CLIP performances. LP++ operates in black-box, relaxes intensive validation searches for hyper-parameters, and runs orders-of-magnitudes faster than state-of-the-art few-shot CLIP adaptation methods. The code is available at: https://github.com/FereshteShakeri/FewShot-CLIP-Strong-Baseline.git. The experiments show that LP++ outperforms existing methods in few-shot CLIP adaptation, with significant performance gains, particularly in low-labeled data regimes. The method is efficient, with a computational overhead that is several orders of magnitude smaller than state-of-the-art methods. LP++ is a strong baseline for few-shot CLIP adaptation, and its results suggest that the potential of LP has been severely underestimated in the existing literature.