21 May 2024 | Jensen Gao, Annie Xie, Ted Xiao, Chelsea Finn, Dorsa Sadigh
This paper presents a study on efficient data collection for robotic manipulation via compositional generalization. The authors investigate whether robot policies can compose environmental factors (e.g., object types, table heights) from their in-domain training data to succeed in unseen factor combinations. They propose data collection strategies that exploit this compositional generalization to reduce the amount of data needed for generalization. The study includes both simulation and real-world experiments on a real robot.
The authors find that policies do exhibit compositional generalization, although leveraging prior robotic datasets is critical for this on a real robot. They propose in-domain data collection strategies that exploit this generalization, which can induce better generalization than naive approaches for the same amount of effort during data collection. On a real robot, a policy using data from such a strategy achieves a success rate of 77.5% when transferred to entirely new environments that encompass unseen combinations of environmental factors, whereas policies trained using data collected without accounting for environmental variation fail to transfer effectively, with a success rate of only 2.5%.
The authors also investigate the effectiveness of different data collection strategies, including Complete, Random, Single Factor, Diagonal, L, and Stair. They find that strategies like Stair, L, and Diagonal may outperform Random by exploiting compositional generalization when possible. Stair generally performs the best, especially in the N=5 setting. They also find that incorporating prior robot data can promote stronger composition of factor values in the in-domain data.
The authors further evaluate the transfer abilities of policies trained on their datasets to entirely new environments that capture some of the factor variety accounted for during data collection. They find that varied in-domain data from BaseKitch, and BridgeData V2 as prior data, are both critical for effective transfer to these new kitchens. Stair outperforms L, although both achieve significant levels of transfer. Co-fine-tuning generally performs better than only fine-tuning.
The authors also evaluate the impact of unaccounted factors, such as distractor objects and lighting, on the performance of their policies. They find that policies using prior data are more robust to these factors. They also investigate the composition of camera position with table texture, finding that policies trained on both sub-datasets together achieve a perfect success rate, suggesting effective composition of camera position with table texture.
Overall, the authors conclude that policies can exhibit composition to generalize to unseen settings, their data collection strategies are sufficient to achieve some of this composition, and that prior data is important for this composition to happen effectively.This paper presents a study on efficient data collection for robotic manipulation via compositional generalization. The authors investigate whether robot policies can compose environmental factors (e.g., object types, table heights) from their in-domain training data to succeed in unseen factor combinations. They propose data collection strategies that exploit this compositional generalization to reduce the amount of data needed for generalization. The study includes both simulation and real-world experiments on a real robot.
The authors find that policies do exhibit compositional generalization, although leveraging prior robotic datasets is critical for this on a real robot. They propose in-domain data collection strategies that exploit this generalization, which can induce better generalization than naive approaches for the same amount of effort during data collection. On a real robot, a policy using data from such a strategy achieves a success rate of 77.5% when transferred to entirely new environments that encompass unseen combinations of environmental factors, whereas policies trained using data collected without accounting for environmental variation fail to transfer effectively, with a success rate of only 2.5%.
The authors also investigate the effectiveness of different data collection strategies, including Complete, Random, Single Factor, Diagonal, L, and Stair. They find that strategies like Stair, L, and Diagonal may outperform Random by exploiting compositional generalization when possible. Stair generally performs the best, especially in the N=5 setting. They also find that incorporating prior robot data can promote stronger composition of factor values in the in-domain data.
The authors further evaluate the transfer abilities of policies trained on their datasets to entirely new environments that capture some of the factor variety accounted for during data collection. They find that varied in-domain data from BaseKitch, and BridgeData V2 as prior data, are both critical for effective transfer to these new kitchens. Stair outperforms L, although both achieve significant levels of transfer. Co-fine-tuning generally performs better than only fine-tuning.
The authors also evaluate the impact of unaccounted factors, such as distractor objects and lighting, on the performance of their policies. They find that policies using prior data are more robust to these factors. They also investigate the composition of camera position with table texture, finding that policies trained on both sub-datasets together achieve a perfect success rate, suggesting effective composition of camera position with table texture.
Overall, the authors conclude that policies can exhibit composition to generalize to unseen settings, their data collection strategies are sufficient to achieve some of this composition, and that prior data is important for this composition to happen effectively.