18 May 2025 | Zhengyi Luo, Jinkun Cao, Sammy Christen, Alexander Winkler, Kris Kitani, Weipeng Xu
The paper "Omnigrasp: Grasping Diverse Objects with Simulated Humanoids" presents a method for controlling a simulated humanoid to grasp and manipulate objects, following complex trajectories. The key challenge in this task is the dexterity required for precise object manipulation, especially with a humanoid robot that has a high degree of freedom. To address this, the authors introduce a novel approach that leverages a pre-trained universal dexterous humanoid motion representation, which significantly speeds up training and enables the controller to handle a wide range of objects and trajectories.
The method, called Omnigrasp, uses a hierarchical reinforcement learning (RL) framework. It trains on a large dataset of diverse object meshes and trajectories, without requiring paired full-body motion data. The core idea is to use a pre-grasp guided grasping training process, where the policy is trained using only the hand pose before grasping, which helps in stabilizing the grasp and improving the success rate. The reward function is designed to encourage the policy to approach the object, initiate a grasp, and then follow the desired trajectory.
The authors also introduce PULSE-X, a physics-based universal dexterous humanoid motion representation, which extends the PULSE framework to include articulated fingers. This representation is trained using a large-scale human motion dataset augmented with finger motion and distilled into a compact latent space. The latent space serves as the action space for the RL policy, allowing for simple state and reward designs.
Experiments on datasets such as GRAB, OakInk, and OMOMO demonstrate that Omnigrasp can successfully grasp and manipulate a wide variety of objects, achieving high success rates in both grasping and trajectory following. The method generalizes well to unseen objects and trajectories, and it shows robustness to input noise, suggesting potential for real-world deployment with appropriate modifications.
The paper concludes by discussing limitations, such as the need for further improvements in trajectory following success rates and grasping diversity, and outlines future work directions, including enhancing the humanoid motion representation and improving object representation for better generalization.The paper "Omnigrasp: Grasping Diverse Objects with Simulated Humanoids" presents a method for controlling a simulated humanoid to grasp and manipulate objects, following complex trajectories. The key challenge in this task is the dexterity required for precise object manipulation, especially with a humanoid robot that has a high degree of freedom. To address this, the authors introduce a novel approach that leverages a pre-trained universal dexterous humanoid motion representation, which significantly speeds up training and enables the controller to handle a wide range of objects and trajectories.
The method, called Omnigrasp, uses a hierarchical reinforcement learning (RL) framework. It trains on a large dataset of diverse object meshes and trajectories, without requiring paired full-body motion data. The core idea is to use a pre-grasp guided grasping training process, where the policy is trained using only the hand pose before grasping, which helps in stabilizing the grasp and improving the success rate. The reward function is designed to encourage the policy to approach the object, initiate a grasp, and then follow the desired trajectory.
The authors also introduce PULSE-X, a physics-based universal dexterous humanoid motion representation, which extends the PULSE framework to include articulated fingers. This representation is trained using a large-scale human motion dataset augmented with finger motion and distilled into a compact latent space. The latent space serves as the action space for the RL policy, allowing for simple state and reward designs.
Experiments on datasets such as GRAB, OakInk, and OMOMO demonstrate that Omnigrasp can successfully grasp and manipulate a wide variety of objects, achieving high success rates in both grasping and trajectory following. The method generalizes well to unseen objects and trajectories, and it shows robustness to input noise, suggesting potential for real-world deployment with appropriate modifications.
The paper concludes by discussing limitations, such as the need for further improvements in trajectory following success rates and grasping diversity, and outlines future work directions, including enhancing the humanoid motion representation and improving object representation for better generalization.