25 Apr 2016 | Mariusz Bojarski, Davide Del Testa, Daniel Dworakowski, Bernhard Firner, Beat Flepp, Prasoon Goyal, Lawrence D. Jackel, Mathew Monfort, Urs Muller, Jiaakai Zhang, Xin Zhang, Jake Zhao, Karol Zieba
The paper presents an end-to-end learning approach for self-driving cars using a convolutional neural network (CNN) to map raw pixels from a single front-facing camera directly to steering commands. The system is trained with minimal human-labeled data and can drive on local roads with or without lane markings, as well as on highways and in areas with unclear visual guidance, such as parking lots and unpaved roads. The CNN automatically learns internal representations of necessary processing steps, such as detecting useful road features, using only the human steering angle as a training signal. This approach optimizes all processing steps simultaneously, leading to better performance and smaller network sizes compared to explicit decomposition of the problem. The system operates at 30 frames per second (FPS) and has been tested in various conditions, including diverse lighting and weather scenarios. The paper also discusses the data collection process, network architecture, and evaluation methods, including both simulation and on-road tests. The results demonstrate that the CNN can learn meaningful road features and maintain high autonomy levels, with minimal human intervention.The paper presents an end-to-end learning approach for self-driving cars using a convolutional neural network (CNN) to map raw pixels from a single front-facing camera directly to steering commands. The system is trained with minimal human-labeled data and can drive on local roads with or without lane markings, as well as on highways and in areas with unclear visual guidance, such as parking lots and unpaved roads. The CNN automatically learns internal representations of necessary processing steps, such as detecting useful road features, using only the human steering angle as a training signal. This approach optimizes all processing steps simultaneously, leading to better performance and smaller network sizes compared to explicit decomposition of the problem. The system operates at 30 frames per second (FPS) and has been tested in various conditions, including diverse lighting and weather scenarios. The paper also discusses the data collection process, network architecture, and evaluation methods, including both simulation and on-road tests. The results demonstrate that the CNN can learn meaningful road features and maintain high autonomy levels, with minimal human intervention.