This paper presents a self-supervised approach for robot grasping that leverages a large-scale dataset of 50,000 trial-and-error grasps collected over 700 hours of robot interaction. The method aims to train a Convolutional Neural Network (CNN) to predict grasp locations without relying on human-labeled data. The dataset is created through robot experimentation, allowing the model to learn from extensive trial and error, which is crucial for generalization to unseen objects.
The paper formulates the grasp prediction problem as an 18-way binary classification over image patches, where each class corresponds to a specific grasp angle. A multi-stage learning approach is introduced, where a CNN trained in one stage is used to collect hard negatives in subsequent stages, improving the model's ability to learn effective grasp configurations.
The approach is compared to several baselines, including heuristic methods and learning-based algorithms such as k-Nearest Neighbors (kNN) and Linear SVM. The results show that the proposed method achieves a high accuracy of 79.5% on a held-out test set of unseen objects, demonstrating the effectiveness of large-scale data collection and multi-stage learning in grasping tasks.
The method is also validated through real robot testing, where the robot successfully grasps and lifts novel objects with a success rate of 66%. The system is shown to be robust in cluttered environments, highlighting its potential for practical robotic applications. The paper concludes that large-scale datasets and self-supervised learning are essential for advancing robot grasping capabilities, and the presented dataset is made available for further research.This paper presents a self-supervised approach for robot grasping that leverages a large-scale dataset of 50,000 trial-and-error grasps collected over 700 hours of robot interaction. The method aims to train a Convolutional Neural Network (CNN) to predict grasp locations without relying on human-labeled data. The dataset is created through robot experimentation, allowing the model to learn from extensive trial and error, which is crucial for generalization to unseen objects.
The paper formulates the grasp prediction problem as an 18-way binary classification over image patches, where each class corresponds to a specific grasp angle. A multi-stage learning approach is introduced, where a CNN trained in one stage is used to collect hard negatives in subsequent stages, improving the model's ability to learn effective grasp configurations.
The approach is compared to several baselines, including heuristic methods and learning-based algorithms such as k-Nearest Neighbors (kNN) and Linear SVM. The results show that the proposed method achieves a high accuracy of 79.5% on a held-out test set of unseen objects, demonstrating the effectiveness of large-scale data collection and multi-stage learning in grasping tasks.
The method is also validated through real robot testing, where the robot successfully grasps and lifts novel objects with a success rate of 66%. The system is shown to be robust in cluttered environments, highlighting its potential for practical robotic applications. The paper concludes that large-scale datasets and self-supervised learning are essential for advancing robot grasping capabilities, and the presented dataset is made available for further research.