This paper addresses the challenge of learning to grasp objects using a large-scale self-supervised approach. Traditional methods rely on human-labeled datasets, which are costly and prone to bias. The authors propose a novel method that leverages trial-and-error experiments to collect 50K data points over 700 hours of robot grasping attempts. This approach significantly increases the amount of training data, allowing for the training of a high-capacity Convolutional Neural Network (CNN) without severe overfitting. The CNN is trained to predict grasp locations by formulating the problem as an 18-way binary classification over image patches. Additionally, a multi-stage learning approach is introduced, where a CNN trained in one stage is used to collect hard negatives in subsequent stages. The experiments demonstrate the effectiveness of large-scale datasets and multi-stage training in improving grasp prediction accuracy. The method achieves state-of-the-art performance on generalizing to unseen objects and shows robustness in real robot testing, including clutter removal tasks.This paper addresses the challenge of learning to grasp objects using a large-scale self-supervised approach. Traditional methods rely on human-labeled datasets, which are costly and prone to bias. The authors propose a novel method that leverages trial-and-error experiments to collect 50K data points over 700 hours of robot grasping attempts. This approach significantly increases the amount of training data, allowing for the training of a high-capacity Convolutional Neural Network (CNN) without severe overfitting. The CNN is trained to predict grasp locations by formulating the problem as an 18-way binary classification over image patches. Additionally, a multi-stage learning approach is introduced, where a CNN trained in one stage is used to collect hard negatives in subsequent stages. The experiments demonstrate the effectiveness of large-scale datasets and multi-stage training in improving grasp prediction accuracy. The method achieves state-of-the-art performance on generalizing to unseen objects and shows robustness in real robot testing, including clutter removal tasks.