Surgical-DINO: Adapter Learning of Foundation Models for Depth Estimation in Endoscopic Surgery

Surgical-DINO: Adapter Learning of Foundation Models for Depth Estimation in Endoscopic Surgery

12 Jan 2024 | Beilei Cui, Mobarakol Islam, Long Bai, and Hongliang Ren
Surgical-DINO is a low-rank adaptation (LoRA) method for depth estimation in endoscopic surgery, leveraging the DINOv2 foundation model. The method introduces LoRA layers to adapt the pre-trained DINOv2 model to surgical domain-specific tasks without extensive fine-tuning. The DINOv2 image encoder is frozen, while LoRA layers are added to the Transformer blocks to adapt the model to surgical scenes. The depth decoder is trained to predict depth maps from the surgical images. The model is validated on the SCARED dataset and the Hamlyn dataset, showing superior performance compared to state-of-the-art methods. Ablation studies show that LoRA significantly improves depth estimation accuracy, while larger pre-trained models also enhance performance. The method demonstrates that LoRA is more effective than naive fine-tuning for surgical depth estimation. Surgical-DINO achieves efficient inference with a smaller number of trainable parameters, making it suitable for real-world applications. The results indicate that foundation models require domain-specific adaptation to perform well in surgical tasks. The study highlights the potential of foundation models in surgical depth estimation and the importance of domain adaptation techniques.Surgical-DINO is a low-rank adaptation (LoRA) method for depth estimation in endoscopic surgery, leveraging the DINOv2 foundation model. The method introduces LoRA layers to adapt the pre-trained DINOv2 model to surgical domain-specific tasks without extensive fine-tuning. The DINOv2 image encoder is frozen, while LoRA layers are added to the Transformer blocks to adapt the model to surgical scenes. The depth decoder is trained to predict depth maps from the surgical images. The model is validated on the SCARED dataset and the Hamlyn dataset, showing superior performance compared to state-of-the-art methods. Ablation studies show that LoRA significantly improves depth estimation accuracy, while larger pre-trained models also enhance performance. The method demonstrates that LoRA is more effective than naive fine-tuning for surgical depth estimation. Surgical-DINO achieves efficient inference with a smaller number of trainable parameters, making it suitable for real-world applications. The results indicate that foundation models require domain-specific adaptation to perform well in surgical tasks. The study highlights the potential of foundation models in surgical depth estimation and the importance of domain adaptation techniques.
Reach us at info@study.space