29 May 2024 | Yi-Lin Wei, Jian-Jian Jiang, Cheng-Yi Xing, Xian-Tuo Tan, Xiao-Ming Wu, Hao Li, Mark Cutkosky, Wei-Shi Zheng
This paper introduces a novel task called "Dexterous Grasp as You Say" (DexGYS), enabling robots to perform dexterous grasping based on natural language commands. The key challenge is the lack of datasets with natural human guidance, so the authors propose DexGYSNet, a language-guided dexterous grasp dataset with high-quality annotations and flexible language guidance. The dataset is constructed cost-effectively using human hand-object interaction retargeting and an LLM-assisted annotation system, containing 50,000 pairs of dexterous grasps and corresponding language guidance for 1,800 common household objects.
To generate dexterous grasps based on human language instructions, the authors introduce the DexGYSgrasp framework, which decomposes the complex learning process into two progressive objectives. The first component learns grasp distribution with intention alignment and diversity, while the second component refines grasp quality while maintaining intention consistency. The framework addresses the challenge of balancing grasp quality, intention alignment, and diversity by using a progressive strategy, avoiding the common penetration loss that hinders learning.
Extensive experiments on DexGYSNet and real-world scenarios show that the framework generates intention-consistent, high-quality, and diverse grasps. The results demonstrate that the framework outperforms existing methods in terms of intention consistency and grasp diversity, while achieving comparable performance in grasp quality. The framework's progressive components and losses ensure excellent intention alignment, high quality, and diversity in generated grasps. The experiments also validate the practicality of the framework in real-world settings, showing its effectiveness in generating dexterous grasps for various objects. The study highlights the importance of language-guided dexterous grasp generation in robotics and deep learning, offering a promising solution for industrial and domestic applications.This paper introduces a novel task called "Dexterous Grasp as You Say" (DexGYS), enabling robots to perform dexterous grasping based on natural language commands. The key challenge is the lack of datasets with natural human guidance, so the authors propose DexGYSNet, a language-guided dexterous grasp dataset with high-quality annotations and flexible language guidance. The dataset is constructed cost-effectively using human hand-object interaction retargeting and an LLM-assisted annotation system, containing 50,000 pairs of dexterous grasps and corresponding language guidance for 1,800 common household objects.
To generate dexterous grasps based on human language instructions, the authors introduce the DexGYSgrasp framework, which decomposes the complex learning process into two progressive objectives. The first component learns grasp distribution with intention alignment and diversity, while the second component refines grasp quality while maintaining intention consistency. The framework addresses the challenge of balancing grasp quality, intention alignment, and diversity by using a progressive strategy, avoiding the common penetration loss that hinders learning.
Extensive experiments on DexGYSNet and real-world scenarios show that the framework generates intention-consistent, high-quality, and diverse grasps. The results demonstrate that the framework outperforms existing methods in terms of intention consistency and grasp diversity, while achieving comparable performance in grasp quality. The framework's progressive components and losses ensure excellent intention alignment, high quality, and diversity in generated grasps. The experiments also validate the practicality of the framework in real-world settings, showing its effectiveness in generating dexterous grasps for various objects. The study highlights the importance of language-guided dexterous grasp generation in robotics and deep learning, offering a promising solution for industrial and domestic applications.