2 Apr 2024 | Yunshi Huang, Fereshteh Shakeri, Jose Dolz, Malik Boudiaf, Houda Bahig, Ismail Ben Ayed
The paper introduces LP++, a novel linear probe for few-shot CLIP adaptation. LP++ generalizes the standard linear probe by incorporating learnable multipliers to blend image and text embeddings, enhancing the model's ability to leverage both modalities. The authors propose a Block Majorize-Minimize (BMM) optimization algorithm, which efficiently updates the visual prototypes and blending parameters without the need for extensive hyper-parameter tuning. This approach yields competitive performance in few-shot learning tasks, outperforming existing methods in terms of accuracy and computational efficiency. LP++ operates in a black-box manner, making it suitable for deployment in low-resource and privacy-preserving scenarios. The paper also provides detailed experimental results and comparisons with state-of-the-art methods, demonstrating the effectiveness and efficiency of LP++.The paper introduces LP++, a novel linear probe for few-shot CLIP adaptation. LP++ generalizes the standard linear probe by incorporating learnable multipliers to blend image and text embeddings, enhancing the model's ability to leverage both modalities. The authors propose a Block Majorize-Minimize (BMM) optimization algorithm, which efficiently updates the visual prototypes and blending parameters without the need for extensive hyper-parameter tuning. This approach yields competitive performance in few-shot learning tasks, outperforming existing methods in terms of accuracy and computational efficiency. LP++ operates in a black-box manner, making it suitable for deployment in low-resource and privacy-preserving scenarios. The paper also provides detailed experimental results and comparisons with state-of-the-art methods, demonstrating the effectiveness and efficiency of LP++.