3 Mar 2018 | Xue Bin Peng, Marcin Andrychowicz, Wojciech Zaremba, Pieter Abbeel
This paper presents a method to bridge the "reality gap" between simulated and real-world environments in robotic control tasks. By randomizing the dynamics of the simulator during training, the authors develop policies that can adapt to significantly different dynamics, including those that differ from the training dynamics. This adaptability enables the policies to generalize to real-world dynamics without additional training on the physical system. The approach is demonstrated on an object pushing task using a robotic arm, where policies trained exclusively in simulation maintain similar performance when deployed on a real robot, reliably moving an object to a desired location from random initial configurations. The paper explores various design decisions and shows that the resulting policies are robust to significant calibration errors. The authors also compare different network architectures and evaluate the impact of randomizing various dynamics parameters, demonstrating the importance of coping with controller latency and sensor noise. The results highlight the effectiveness of the proposed method in transferring learned skills from simulation to the real world.This paper presents a method to bridge the "reality gap" between simulated and real-world environments in robotic control tasks. By randomizing the dynamics of the simulator during training, the authors develop policies that can adapt to significantly different dynamics, including those that differ from the training dynamics. This adaptability enables the policies to generalize to real-world dynamics without additional training on the physical system. The approach is demonstrated on an object pushing task using a robotic arm, where policies trained exclusively in simulation maintain similar performance when deployed on a real robot, reliably moving an object to a desired location from random initial configurations. The paper explores various design decisions and shows that the resulting policies are robust to significant calibration errors. The authors also compare different network architectures and evaluate the impact of randomizing various dynamics parameters, demonstrating the importance of coping with controller latency and sensor noise. The results highlight the effectiveness of the proposed method in transferring learned skills from simulation to the real world.