8 Jun 2024 | Ola Shorinwa*, Johnathan Tucker*, Aliyah Smith, Aiden Swann, Timothy Chen, Roya Firoozi, Monroe Kennedy III, Mac Schwager
Splat-MOVER is a modular robotics stack designed for open-vocabulary robotic manipulation, leveraging the editability of Gaussian Splatting (GSplit) scene representations to enable multi-stage manipulation tasks. The system consists of three main components:
1. **ASK-Splat**: A GSplit representation that distills semantic and grasp affordance features into the 3D scene, enabling geometric, semantic, and affordance understanding of 3D scenes.
2. **SEE-Splat**: A real-time scene-editing module that uses 3D semantic masking and infilling to visualize the motions of objects resulting from robot interactions in the real world, creating a "digital twin" of the evolving environment.
3. **Grasp-Splat**: A grasp generation module that uses ASK-Splat and SEE-Splat to propose affordance-aligned candidate grasps for open-world objects.
The system is trained in real-time from RGB images during a brief scanning phase before operation, and runs in real-time during operation. Splat-MOVER demonstrates superior performance in hardware experiments on a Kinova robot compared to two recent baselines (LERF-TOGO and F3RM) in both single-stage and multi-stage manipulation tasks. The project page is available at <https://splatmover.github.io>, and the code will be made available after review.Splat-MOVER is a modular robotics stack designed for open-vocabulary robotic manipulation, leveraging the editability of Gaussian Splatting (GSplit) scene representations to enable multi-stage manipulation tasks. The system consists of three main components:
1. **ASK-Splat**: A GSplit representation that distills semantic and grasp affordance features into the 3D scene, enabling geometric, semantic, and affordance understanding of 3D scenes.
2. **SEE-Splat**: A real-time scene-editing module that uses 3D semantic masking and infilling to visualize the motions of objects resulting from robot interactions in the real world, creating a "digital twin" of the evolving environment.
3. **Grasp-Splat**: A grasp generation module that uses ASK-Splat and SEE-Splat to propose affordance-aligned candidate grasps for open-world objects.
The system is trained in real-time from RGB images during a brief scanning phase before operation, and runs in real-time during operation. Splat-MOVER demonstrates superior performance in hardware experiments on a Kinova robot compared to two recent baselines (LERF-TOGO and F3RM) in both single-stage and multi-stage manipulation tasks. The project page is available at <https://splatmover.github.io>, and the code will be made available after review.