2024-07-17 | Ji Woong Kim, Tony Z. Zhao, Samuel Schmidgall, Anton Deguet, Marin Kobilarov, Chelsea Finn, Axel Krieger
The Surgical Robot Transformer (SRT) is a novel approach that explores the use of imitation learning for surgical tasks on the da Vinci robot. The da Vinci system, while widely used in surgeries, presents unique challenges due to its inconsistent forward kinematics, primarily caused by imprecise joint measurements. To address these challenges, the authors introduce a relative action formulation that enables successful policy training and deployment using approximate kinematics data. This approach leverages a large repository of clinical data, which contains approximate kinematics, without requiring further corrections.
The study focuses on three fundamental surgical tasks: tissue manipulation, needle handling, and knot-tying. The authors compare three action representations: camera-centric, tool-centric, and hybrid-relative. Camera-centric actions model actions as absolute end-effector poses, while tool-centric and hybrid-relative actions model actions as relative motion, which is more consistent and robust to joint measurement errors. The hybrid-relative approach further improves accuracy by modeling translations with respect to a fixed reference frame.
Experiments demonstrate that the relative motion formulation significantly outperforms absolute motion in terms of task success rates. Additionally, the use of wrist cameras is crucial for achieving high success rates, especially during phases requiring precise depth estimation. The model also shows good generalization to novel scenarios, such as using animal tissues and an unseen 3D suture pad.
The main contributions of the work include a successful demonstration of imitation learning on the da Vinci robot using approximate kinematics data, the effectiveness of relative motion in handling inconsistent kinematics, and the importance of wrist cameras in surgical manipulation tasks. The study highlights the potential of end-to-end imitation learning for autonomous surgery, paving the way for more general-purpose systems in this domain.The Surgical Robot Transformer (SRT) is a novel approach that explores the use of imitation learning for surgical tasks on the da Vinci robot. The da Vinci system, while widely used in surgeries, presents unique challenges due to its inconsistent forward kinematics, primarily caused by imprecise joint measurements. To address these challenges, the authors introduce a relative action formulation that enables successful policy training and deployment using approximate kinematics data. This approach leverages a large repository of clinical data, which contains approximate kinematics, without requiring further corrections.
The study focuses on three fundamental surgical tasks: tissue manipulation, needle handling, and knot-tying. The authors compare three action representations: camera-centric, tool-centric, and hybrid-relative. Camera-centric actions model actions as absolute end-effector poses, while tool-centric and hybrid-relative actions model actions as relative motion, which is more consistent and robust to joint measurement errors. The hybrid-relative approach further improves accuracy by modeling translations with respect to a fixed reference frame.
Experiments demonstrate that the relative motion formulation significantly outperforms absolute motion in terms of task success rates. Additionally, the use of wrist cameras is crucial for achieving high success rates, especially during phases requiring precise depth estimation. The model also shows good generalization to novel scenarios, such as using animal tissues and an unseen 3D suture pad.
The main contributions of the work include a successful demonstration of imitation learning on the da Vinci robot using approximate kinematics data, the effectiveness of relative motion in handling inconsistent kinematics, and the importance of wrist cameras in surgical manipulation tasks. The study highlights the potential of end-to-end imitation learning for autonomous surgery, paving the way for more general-purpose systems in this domain.