CONVOLUTION MEETS LoRA: PARAMETER EFFICIENT FINETUNING FOR SEGMENT ANYTHING MODEL

CONVOLUTION MEETS LoRA: PARAMETER EFFICIENT FINETUNING FOR SEGMENT ANYTHING MODEL

2024 | Zihan Zhong, Zhiqiang Tang, Tong He, Haoyang Fang, Chun Yuan
Conv-LoRA is a parameter-efficient fine-tuning method for the Segment Anything Model (SAM), designed to enhance its performance in real-world semantic segmentation tasks. SAM, a foundational model for image segmentation, excels in zero-shot generalization but struggles in specialized domains like medical imaging and remote sensing. Conv-LoRA integrates lightweight convolutional parameters into Low-Rank Adaptation (LoRA) to inject image-related inductive biases into the plain ViT encoder, reinforcing SAM's local prior assumption. This approach not only preserves SAM's segmentation knowledge but also enables it to learn high-level image semantics, which are constrained by its foreground-background pretraining. Conv-LoRA outperforms other PEFT methods across diverse benchmarks, including natural images, agriculture, remote sensing, and healthcare. It introduces a multi-scale local prior through a Mixture-of-Experts (MoE) mechanism, allowing dynamic selection of feature scales for effective prior injection. Additionally, Conv-LoRA modifies SAM to support multi-class semantic segmentation by adding a classification branch to the mask decoder. The method is simple, generic, and effective, demonstrating superior performance in various downstream tasks. The study highlights the importance of parameter-efficient fine-tuning in adapting SAM to different domains and shows that SAM's pretraining can be leveraged to enhance its ability to learn high-level semantic information. Conv-LoRA provides an efficient and effective solution for parameter-efficient fine-tuning of SAM, making it a promising approach for real-world semantic segmentation tasks.Conv-LoRA is a parameter-efficient fine-tuning method for the Segment Anything Model (SAM), designed to enhance its performance in real-world semantic segmentation tasks. SAM, a foundational model for image segmentation, excels in zero-shot generalization but struggles in specialized domains like medical imaging and remote sensing. Conv-LoRA integrates lightweight convolutional parameters into Low-Rank Adaptation (LoRA) to inject image-related inductive biases into the plain ViT encoder, reinforcing SAM's local prior assumption. This approach not only preserves SAM's segmentation knowledge but also enables it to learn high-level image semantics, which are constrained by its foreground-background pretraining. Conv-LoRA outperforms other PEFT methods across diverse benchmarks, including natural images, agriculture, remote sensing, and healthcare. It introduces a multi-scale local prior through a Mixture-of-Experts (MoE) mechanism, allowing dynamic selection of feature scales for effective prior injection. Additionally, Conv-LoRA modifies SAM to support multi-class semantic segmentation by adding a classification branch to the mask decoder. The method is simple, generic, and effective, demonstrating superior performance in various downstream tasks. The study highlights the importance of parameter-efficient fine-tuning in adapting SAM to different domains and shows that SAM's pretraining can be leveraged to enhance its ability to learn high-level semantic information. Conv-LoRA provides an efficient and effective solution for parameter-efficient fine-tuning of SAM, making it a promising approach for real-world semantic segmentation tasks.
Reach us at info@study.space
[slides] Convolution Meets LoRA%3A Parameter Efficient Finetuning for Segment Anything Model | StudySpace