2 Mar 2018 | Felipe Codevilla1,2 Matthias Müller1,3 Antonio López2 Vladlen Koltun1 Alexey Dosovitskiy1
This paper introduces a method for end-to-end driving using conditional imitation learning, where a vehicle trained to imitate expert driving can be guided by high-level commands. The approach addresses the limitation of traditional imitation learning, which cannot be controlled at test time. By conditioning the learning process on high-level commands, the system can respond to navigational instructions, enabling sensorimotor coordination and following commands. The method is evaluated in realistic urban simulations and on a 1/5 scale robotic truck, demonstrating its effectiveness in complex environments. The system uses vision-based inputs and is responsive to commands such as "turn right at the next intersection." The paper also discusses related work, including previous approaches to imitation learning and reinforcement learning, and presents a detailed methodology for the conditional imitation learning framework. The results show that the proposed approach significantly improves performance in both simulated and physical environments, with the branched architecture outperforming baselines. The study highlights the importance of data augmentation and noise injection in improving generalization and stability. The method enables autonomous vehicles to be controlled by passengers or topological planners, making it a promising approach for future autonomous driving systems.This paper introduces a method for end-to-end driving using conditional imitation learning, where a vehicle trained to imitate expert driving can be guided by high-level commands. The approach addresses the limitation of traditional imitation learning, which cannot be controlled at test time. By conditioning the learning process on high-level commands, the system can respond to navigational instructions, enabling sensorimotor coordination and following commands. The method is evaluated in realistic urban simulations and on a 1/5 scale robotic truck, demonstrating its effectiveness in complex environments. The system uses vision-based inputs and is responsive to commands such as "turn right at the next intersection." The paper also discusses related work, including previous approaches to imitation learning and reinforcement learning, and presents a detailed methodology for the conditional imitation learning framework. The results show that the proposed approach significantly improves performance in both simulated and physical environments, with the branched architecture outperforming baselines. The study highlights the importance of data augmentation and noise injection in improving generalization and stability. The method enables autonomous vehicles to be controlled by passengers or topological planners, making it a promising approach for future autonomous driving systems.