Understanding Grasp as You Say%3A Language-guided Dexterous Grasp Generation

This paper introduces a novel task called "Dexterous Grasp as You Say" (DexGYS), which enables robots to perform dexterous grasps based on natural language commands from humans. The development of this task is hindered by the lack of datasets with natural human guidance, so the authors propose a language-guided dexterous grasp dataset named DexGYSNet. This dataset offers high-quality dexterous grasp annotations and flexible, fine-grained human language guidance. The construction of DexGYSNet is cost-efficient, using a hand-object interaction retargeting strategy and an LLM-assisted language guidance annotation system. To address the challenges of generating dexterous grasps that align with human intentions, ensure high quality, and maintain diversity, the authors introduce the DexGYSGrasp framework. This framework decomposes the complex learning process into two progressive objectives: learning the grasp distribution with intention alignment and diversity, and refining the grasp quality while maintaining intention consistency. Extensive experiments on the DexGYSNet dataset and real-world environments demonstrate that the DexGYSGrasp framework significantly outperforms existing methods in terms of intention consistency, grasp quality, and diversity. The framework's effectiveness is further validated through real-world experiments, showing its practical applicability in robotic grasping tasks.This paper introduces a novel task called "Dexterous Grasp as You Say" (DexGYS), which enables robots to perform dexterous grasps based on natural language commands from humans. The development of this task is hindered by the lack of datasets with natural human guidance, so the authors propose a language-guided dexterous grasp dataset named DexGYSNet. This dataset offers high-quality dexterous grasp annotations and flexible, fine-grained human language guidance. The construction of DexGYSNet is cost-efficient, using a hand-object interaction retargeting strategy and an LLM-assisted language guidance annotation system. To address the challenges of generating dexterous grasps that align with human intentions, ensure high quality, and maintain diversity, the authors introduce the DexGYSGrasp framework. This framework decomposes the complex learning process into two progressive objectives: learning the grasp distribution with intention alignment and diversity, and refining the grasp quality while maintaining intention consistency. Extensive experiments on the DexGYSNet dataset and real-world environments demonstrate that the DexGYSGrasp framework significantly outperforms existing methods in terms of intention consistency, grasp quality, and diversity. The framework's effectiveness is further validated through real-world experiments, showing its practical applicability in robotic grasping tasks.

Grasp as You Say: Language-guided Dexterous Grasp Generation

29 May 2024 | Yi-Lin Wei, Jian-Jian Jiang, Cheng-Yi Xing, Xian-Tuo Tan, Xiao-Ming Wu, Hao Li, Mark Cutkosky, Wei-Shi Zheng