RISE: 3D Perception Makes Real-World Robot Imitation Simple and Effective

RISE: 3D Perception Makes Real-World Robot Imitation Simple and Effective

21 Apr 2024 | Chenxi Wang, Hongjie Fang, Hao-Shu Fang, Cewu Lu
RISE is an end-to-end policy designed for real-world robot imitation learning, focusing on tasks with noisy single-view partial point clouds as input and predicting continuous robot actions as output. The method leverages 3D perception to efficiently model spatial information, which is crucial for precise robot manipulations. RISE employs a sparse 3D encoder to compress point clouds into tokens, followed by sparse positional encoding and a transformer to extract features. These features are then decoded into continuous actions using a diffusion head. Trained with 50 demonstrations for each task, RISE outperforms existing 2D and 3D policies in accuracy and efficiency, demonstrating strong generalization and robustness to environmental changes. The paper evaluates RISE on six real-world tasks, including pick-and-place, 6-DoF pouring, push-to-goal, and long-horizon tasks, showing its effectiveness in handling complex scenarios and adapting to varying object locations and camera views.RISE is an end-to-end policy designed for real-world robot imitation learning, focusing on tasks with noisy single-view partial point clouds as input and predicting continuous robot actions as output. The method leverages 3D perception to efficiently model spatial information, which is crucial for precise robot manipulations. RISE employs a sparse 3D encoder to compress point clouds into tokens, followed by sparse positional encoding and a transformer to extract features. These features are then decoded into continuous actions using a diffusion head. Trained with 50 demonstrations for each task, RISE outperforms existing 2D and 3D policies in accuracy and efficiency, demonstrating strong generalization and robustness to environmental changes. The paper evaluates RISE on six real-world tasks, including pick-and-place, 6-DoF pouring, push-to-goal, and long-horizon tasks, showing its effectiveness in handling complex scenarios and adapting to varying object locations and camera views.
Reach us at info@study.space