17 Jul 2024 | Ji Woong Kim¹, Tony Z. Zhao², Samuel Schmidgall¹, Anton Deguet¹, Marin Kobilarov¹, Chelsea Finn², Axel Krieger¹
This paper presents a method for learning surgical tasks on the da Vinci Surgical Robot (dVRK) using imitation learning. The dVRK is a widely used surgical robot with over 10 million surgeries performed globally. However, its forward kinematics are inconsistent due to imprecise joint measurements, making direct imitation learning challenging. The authors propose a relative action formulation that enables successful policy training and deployment using approximate kinematics data. This approach allows the use of a large repository of clinical data for robot learning without further corrections.
The key idea is to model policy actions as relative motion rather than absolute poses, which is more consistent with the dVRK's kinematics. Three action representations are considered: camera-centric, tool-centric, and hybrid-relative. The hybrid-relative approach, which models translation actions relative to a fixed frame and rotation actions relative to a moving frame, achieves the highest success rates in surgical tasks.
The authors also explore the use of wrist cameras in surgical workflows. While not commonly used in clinical settings, wrist cameras have shown effectiveness in improving policy performance and facilitating generalization to out-of-distribution scenarios. The authors design removable brackets to enable easy sharing across various surgical instruments.
The experiments demonstrate that the relative motion on the dVRK is more consistent than its absolute motion, and that a carefully chosen relative action representation can sufficiently train policies that achieve high success rates in surgical manipulation tasks. Using wrist cameras significantly improves policy performance, especially during phases of the procedure when precise depth estimation is crucial. The model also shows robustness to novel scenarios, such as in the presence of unseen 3D suture pads and animal tissues.
The main contributions of this work include: (1) a successful demonstration of imitation learning on the dVRK using approximate kinematics data without requiring further corrections; (2) experiments showing that imitation learning can effectively learn complex surgical tasks and generalize to novel scenarios; (3) ablative experiments demonstrating the importance of wrist cameras for learning surgical manipulation tasks.This paper presents a method for learning surgical tasks on the da Vinci Surgical Robot (dVRK) using imitation learning. The dVRK is a widely used surgical robot with over 10 million surgeries performed globally. However, its forward kinematics are inconsistent due to imprecise joint measurements, making direct imitation learning challenging. The authors propose a relative action formulation that enables successful policy training and deployment using approximate kinematics data. This approach allows the use of a large repository of clinical data for robot learning without further corrections.
The key idea is to model policy actions as relative motion rather than absolute poses, which is more consistent with the dVRK's kinematics. Three action representations are considered: camera-centric, tool-centric, and hybrid-relative. The hybrid-relative approach, which models translation actions relative to a fixed frame and rotation actions relative to a moving frame, achieves the highest success rates in surgical tasks.
The authors also explore the use of wrist cameras in surgical workflows. While not commonly used in clinical settings, wrist cameras have shown effectiveness in improving policy performance and facilitating generalization to out-of-distribution scenarios. The authors design removable brackets to enable easy sharing across various surgical instruments.
The experiments demonstrate that the relative motion on the dVRK is more consistent than its absolute motion, and that a carefully chosen relative action representation can sufficiently train policies that achieve high success rates in surgical manipulation tasks. Using wrist cameras significantly improves policy performance, especially during phases of the procedure when precise depth estimation is crucial. The model also shows robustness to novel scenarios, such as in the presence of unseen 3D suture pads and animal tissues.
The main contributions of this work include: (1) a successful demonstration of imitation learning on the dVRK using approximate kinematics data without requiring further corrections; (2) experiments showing that imitation learning can effectively learn complex surgical tasks and generalize to novel scenarios; (3) ablative experiments demonstrating the importance of wrist cameras for learning surgical manipulation tasks.