End to End Learning for Self-Driving Cars

End to End Learning for Self-Driving Cars

25 Apr 2016 | Mariusz Bojarski, Davide Del Testa, Daniel Dworakowski, Bernhard Firner, Beat Flepp, Prasoon Goyal, Lawrence D. Jackel, Mathew Monfort, Urs Muller, Jiaakai Zhang, Xin Zhang, Jake Zhao, Karol Zieba
This paper presents an end-to-end learning approach for self-driving cars, where a convolutional neural network (CNN) is trained to map raw pixel data from a single front-facing camera directly to steering commands. The system learns to drive in various conditions, including traffic on local roads with or without lane markings, highways, parking lots, and unpaved roads, using minimal training data from human drivers. The CNN automatically learns internal representations of necessary processing steps, such as detecting useful road features, without explicit training on specific tasks like lane detection. Compared to traditional methods that decompose the problem into separate components (e.g., lane detection, path planning, and control), the end-to-end system optimizes all processing steps simultaneously. This approach leads to better performance and smaller systems, as internal components self-optimize to maximize overall system performance rather than optimizing human-selected intermediate criteria. Smaller networks are possible because the system learns to solve the problem with the minimal number of processing steps. The system was trained using an NVIDIA DevBox and Torch 7, and tested on an NVIDIA DRIVE PX self-driving car computer running Torch 7. The system operates at 30 frames per second (FPS). Data was collected from a variety of roads and weather conditions, including highways, local roads, and residential streets, with a total of about 72 hours of driving data collected by March 2016. The CNN architecture consists of 9 layers, including a normalization layer, 5 convolutional layers, and 3 fully connected layers. The input image is split into YUV planes and passed through the network. The first layer performs image normalization, and the convolutional layers are designed for feature extraction. The fully connected layers function as a controller for steering, but the end-to-end training makes it difficult to distinguish between feature extraction and control functions. Training data was selected based on the driver's activity, with a focus on lane following. Data augmentation was used to teach the network how to recover from poor positions or orientations. The system was evaluated in simulation and on-road tests, with simulation tests showing high autonomy (up to 90%) and on-road tests demonstrating 98% autonomy in Monmouth County, NJ. The CNN was able to learn to detect useful road features without explicit labels during training, as demonstrated by the activation maps of the first two feature map layers for different inputs. The system shows promise for autonomous driving, but further work is needed to improve robustness and visualization of internal processing steps.This paper presents an end-to-end learning approach for self-driving cars, where a convolutional neural network (CNN) is trained to map raw pixel data from a single front-facing camera directly to steering commands. The system learns to drive in various conditions, including traffic on local roads with or without lane markings, highways, parking lots, and unpaved roads, using minimal training data from human drivers. The CNN automatically learns internal representations of necessary processing steps, such as detecting useful road features, without explicit training on specific tasks like lane detection. Compared to traditional methods that decompose the problem into separate components (e.g., lane detection, path planning, and control), the end-to-end system optimizes all processing steps simultaneously. This approach leads to better performance and smaller systems, as internal components self-optimize to maximize overall system performance rather than optimizing human-selected intermediate criteria. Smaller networks are possible because the system learns to solve the problem with the minimal number of processing steps. The system was trained using an NVIDIA DevBox and Torch 7, and tested on an NVIDIA DRIVE PX self-driving car computer running Torch 7. The system operates at 30 frames per second (FPS). Data was collected from a variety of roads and weather conditions, including highways, local roads, and residential streets, with a total of about 72 hours of driving data collected by March 2016. The CNN architecture consists of 9 layers, including a normalization layer, 5 convolutional layers, and 3 fully connected layers. The input image is split into YUV planes and passed through the network. The first layer performs image normalization, and the convolutional layers are designed for feature extraction. The fully connected layers function as a controller for steering, but the end-to-end training makes it difficult to distinguish between feature extraction and control functions. Training data was selected based on the driver's activity, with a focus on lane following. Data augmentation was used to teach the network how to recover from poor positions or orientations. The system was evaluated in simulation and on-road tests, with simulation tests showing high autonomy (up to 90%) and on-road tests demonstrating 98% autonomy in Monmouth County, NJ. The CNN was able to learn to detect useful road features without explicit labels during training, as demonstrated by the activation maps of the first two feature map layers for different inputs. The system shows promise for autonomous driving, but further work is needed to improve robustness and visualization of internal processing steps.
Reach us at info@study.space
[slides] End to End Learning for Self-Driving Cars | StudySpace