12 Jan 2024 | Beilei Cui, Mobarakol Islam, Long Bai, Hongliang Ren
**Surgical-DINO: Adapter Learning of Foundation Models for Depth Estimation in Endoscopic Surgery**
**Authors:** Beilei Cui, Mobarakol Islam, Long Bai, Hongliang Ren
**Institution:** The Chinese University of Hong Kong, Wellcome/EPSRC Centre for Interventional and Surgical Sciences (WEISS), University College London, National University of Singapore
**Abstract:**
This paper presents Surgical-DINO, a low-ranked adaptation (LoRA) of the DINOv2 foundation model for depth estimation in endoscopic surgery. The method integrates LoRA layers into DINO to adapt with surgery-specific domain knowledge, freezing the DINO image encoder and optimizing only the LoRA layers and depth decoder. Extensive validation on the MICCAI challenge dataset SCARED shows that Surgical-DINO significantly outperforms state-of-the-art models in endoscopic depth estimation tasks. Ablation studies confirm the effectiveness of LoRA layers and adaptation.
**Key Contributions:**
1. Extending DINOv2 to medical image depth estimation.
2. Presenting a LoRA-based adaptation strategy for DINOv2 in the surgical image domain.
3. Validation of Surgical-DINO on two public datasets, demonstrating superior performance over other depth estimation methods.
**Methodology:**
- **DINOv2:** A self-supervised foundation model for vision tasks.
- **LoRA:** A technique that adds trainable rank decomposition matrices to reduce the number of parameters for downstream tasks.
- **Surgical-DINO Architecture:** Involves freezing the DINO image encoder and adding LoRA layers to capture learnable information, followed by a trainable depth decoder.
**Experiments:**
- **Datasets:** SCARED and Hamlyn1.
- **Implementation Details:** Using PyTorch on NVIDIA RTX 3090 GPU with specific hyperparameters.
- **Performance Metrics:** Abs Rel, Sq Rel, RMSE, RMSE log, and δ.
- **Results:** Surgical-DINO outperforms other methods in all evaluation metrics, highlighting the effectiveness of LoRA adaptation.
**Conclusion:**
Surgical-DINO demonstrates the successful adaptation of foundation models to the surgical domain for depth estimation, emphasizing the importance of LoRA adaptation over zero-shot prediction or naive fine-tuning.**Surgical-DINO: Adapter Learning of Foundation Models for Depth Estimation in Endoscopic Surgery**
**Authors:** Beilei Cui, Mobarakol Islam, Long Bai, Hongliang Ren
**Institution:** The Chinese University of Hong Kong, Wellcome/EPSRC Centre for Interventional and Surgical Sciences (WEISS), University College London, National University of Singapore
**Abstract:**
This paper presents Surgical-DINO, a low-ranked adaptation (LoRA) of the DINOv2 foundation model for depth estimation in endoscopic surgery. The method integrates LoRA layers into DINO to adapt with surgery-specific domain knowledge, freezing the DINO image encoder and optimizing only the LoRA layers and depth decoder. Extensive validation on the MICCAI challenge dataset SCARED shows that Surgical-DINO significantly outperforms state-of-the-art models in endoscopic depth estimation tasks. Ablation studies confirm the effectiveness of LoRA layers and adaptation.
**Key Contributions:**
1. Extending DINOv2 to medical image depth estimation.
2. Presenting a LoRA-based adaptation strategy for DINOv2 in the surgical image domain.
3. Validation of Surgical-DINO on two public datasets, demonstrating superior performance over other depth estimation methods.
**Methodology:**
- **DINOv2:** A self-supervised foundation model for vision tasks.
- **LoRA:** A technique that adds trainable rank decomposition matrices to reduce the number of parameters for downstream tasks.
- **Surgical-DINO Architecture:** Involves freezing the DINO image encoder and adding LoRA layers to capture learnable information, followed by a trainable depth decoder.
**Experiments:**
- **Datasets:** SCARED and Hamlyn1.
- **Implementation Details:** Using PyTorch on NVIDIA RTX 3090 GPU with specific hyperparameters.
- **Performance Metrics:** Abs Rel, Sq Rel, RMSE, RMSE log, and δ.
- **Results:** Surgical-DINO outperforms other methods in all evaluation metrics, highlighting the effectiveness of LoRA adaptation.
**Conclusion:**
Surgical-DINO demonstrates the successful adaptation of foundation models to the surgical domain for depth estimation, emphasizing the importance of LoRA adaptation over zero-shot prediction or naive fine-tuning.