Understanding Surgical-DINO%3A adapter learning of foundation models for depth estimation in endoscopic surgery

**Surgical-DINO: Adapter Learning of Foundation Models for Depth Estimation in Endoscopic Surgery** **Authors:** Beilei Cui, Mobarakol Islam, Long Bai, Hongliang Ren **Institution:** The Chinese University of Hong Kong, Wellcome/EPSRC Centre for Interventional and Surgical Sciences (WEISS), University College London, National University of Singapore **Abstract:** This paper presents Surgical-DINO, a low-ranked adaptation (LoRA) of the DINOv2 foundation model for depth estimation in endoscopic surgery. The method integrates LoRA layers into DINO to adapt with surgery-specific domain knowledge, freezing the DINO image encoder and optimizing only the LoRA layers and depth decoder. Extensive validation on the MICCAI challenge dataset SCARED shows that Surgical-DINO significantly outperforms state-of-the-art models in endoscopic depth estimation tasks. Ablation studies confirm the effectiveness of LoRA layers and adaptation. **Key Contributions:** 1. Extending DINOv2 to medical image depth estimation. 2. Presenting a LoRA-based adaptation strategy for DINOv2 in the surgical image domain. 3. Validation of Surgical-DINO on two public datasets, demonstrating superior performance over other depth estimation methods. **Methodology:** - **DINOv2:** A self-supervised foundation model for vision tasks. - **LoRA:** A technique that adds trainable rank decomposition matrices to reduce the number of parameters for downstream tasks. - **Surgical-DINO Architecture:** Involves freezing the DINO image encoder and adding LoRA layers to capture learnable information, followed by a trainable depth decoder. **Experiments:** - **Datasets:** SCARED and Hamlyn1. - **Implementation Details:** Using PyTorch on NVIDIA RTX 3090 GPU with specific hyperparameters. - **Performance Metrics:** Abs Rel, Sq Rel, RMSE, RMSE log, and δ. - **Results:** Surgical-DINO outperforms other methods in all evaluation metrics, highlighting the effectiveness of LoRA adaptation. **Conclusion:** Surgical-DINO demonstrates the successful adaptation of foundation models to the surgical domain for depth estimation, emphasizing the importance of LoRA adaptation over zero-shot prediction or naive fine-tuning.**Surgical-DINO: Adapter Learning of Foundation Models for Depth Estimation in Endoscopic Surgery** **Authors:** Beilei Cui, Mobarakol Islam, Long Bai, Hongliang Ren **Institution:** The Chinese University of Hong Kong, Wellcome/EPSRC Centre for Interventional and Surgical Sciences (WEISS), University College London, National University of Singapore **Abstract:** This paper presents Surgical-DINO, a low-ranked adaptation (LoRA) of the DINOv2 foundation model for depth estimation in endoscopic surgery. The method integrates LoRA layers into DINO to adapt with surgery-specific domain knowledge, freezing the DINO image encoder and optimizing only the LoRA layers and depth decoder. Extensive validation on the MICCAI challenge dataset SCARED shows that Surgical-DINO significantly outperforms state-of-the-art models in endoscopic depth estimation tasks. Ablation studies confirm the effectiveness of LoRA layers and adaptation. **Key Contributions:** 1. Extending DINOv2 to medical image depth estimation. 2. Presenting a LoRA-based adaptation strategy for DINOv2 in the surgical image domain. 3. Validation of Surgical-DINO on two public datasets, demonstrating superior performance over other depth estimation methods. **Methodology:** - **DINOv2:** A self-supervised foundation model for vision tasks. - **LoRA:** A technique that adds trainable rank decomposition matrices to reduce the number of parameters for downstream tasks. - **Surgical-DINO Architecture:** Involves freezing the DINO image encoder and adding LoRA layers to capture learnable information, followed by a trainable depth decoder. **Experiments:** - **Datasets:** SCARED and Hamlyn1. - **Implementation Details:** Using PyTorch on NVIDIA RTX 3090 GPU with specific hyperparameters. - **Performance Metrics:** Abs Rel, Sq Rel, RMSE, RMSE log, and δ. - **Results:** Surgical-DINO outperforms other methods in all evaluation metrics, highlighting the effectiveness of LoRA adaptation. **Conclusion:** Surgical-DINO demonstrates the successful adaptation of foundation models to the surgical domain for depth estimation, emphasizing the importance of LoRA adaptation over zero-shot prediction or naive fine-tuning.

Surgical-DINO: Adapter Learning of Foundation Models for Depth Estimation in Endoscopic Surgery

12 Jan 2024 | Beilei Cui, Mobarakol Islam, Long Bai, Hongliang Ren