RoadRunner - Learning Traversability Estimation for Autonomous Off-road Driving

RoadRunner - Learning Traversability Estimation for Autonomous Off-road Driving

2024 | Jonas Frey, Shehryar Khattak, Manthan Patel, Deegan Atha, Julian Nubert, Curtis Padgett, Marco Hutter, Patrick Spie1er
RoadRunner is a novel framework for predicting terrain traversability and elevation maps directly from camera and LiDAR sensor inputs, enabling reliable autonomous navigation in off-road environments. The framework fuses sensory information, handles uncertainty, and generates contextually informed predictions about terrain geometry and traversability while operating at low latency. Unlike existing methods that rely on classifying semantic classes and using heuristics, RoadRunner is trained end-to-end in a self-supervised manner. It leverages popular sensor fusion network architectures from autonomous driving, embedding LiDAR and camera information into a common Bird's Eye View (BEV) perspective. Training is enabled by using an existing traversability estimation stack to generate training data in hindsight from real-world off-road datasets. RoadRunner improves system latency by a factor of ~4, reducing it from 500 ms to 140 ms, while improving accuracy for traversability costs and elevation map predictions. The framework is tested in multiple real-world driving scenarios through unstructured desert environments, demonstrating its effectiveness in enabling safe and reliable high-speed off-road navigation. RoadRunner's key contributions include a novel network architecture that predicts traversability costs and elevation maps directly from multi-LiDAR and multi-camera data at low latency, a general framework for generating pseudo-ground truth elevation maps and traversability costs using temporal data aggregation in hindsight, an overview of NASA's X-Racer off-road autonomy research stack, and an exhaustive evaluation of the RoadRunner architecture on real-world data. RoadRunner outperforms X-Racer in elevation mapping and traversability estimation by leveraging visual and geometric data, and can predict missing elevation and traversability information based on learned context. It also detects obstacles at longer ranges compared to X-Racer, improving traversability cost estimation by 52.3% in MSE and 36.0% in MAE for elevation map estimation while reducing perception-to-traversability latency by a factor of ~4. RoadRunner uses self-supervision, leveraging the X-Racer software stack to generate pseudo ground truth labels for training. It fuses information from past and future measurements to obtain reliable traversability and elevation estimates. The framework uses a combination of camera and LiDAR data, with features extracted from camera images using EfficientNet-B0 and point cloud features from LiDAR data. The network architecture includes a lifting process to project features from the camera image plane into 3D space, followed by multi-modal fusion of visual and geometric information. The network is trained using a weighted mean squared error loss function, with weights calculated based on the normalized frequency of elevation and traversability scores. The final optimization objective combines the losses for traversability and elevation estimation. RoadRunner is implemented on an Nvidia RTX3090 GPU with a batch size of 6, and accounts for 24.0 M parameters. The framework is evaluated on a dataset of 16.5 km of off-road driving data collected at Paso RobRoadRunner is a novel framework for predicting terrain traversability and elevation maps directly from camera and LiDAR sensor inputs, enabling reliable autonomous navigation in off-road environments. The framework fuses sensory information, handles uncertainty, and generates contextually informed predictions about terrain geometry and traversability while operating at low latency. Unlike existing methods that rely on classifying semantic classes and using heuristics, RoadRunner is trained end-to-end in a self-supervised manner. It leverages popular sensor fusion network architectures from autonomous driving, embedding LiDAR and camera information into a common Bird's Eye View (BEV) perspective. Training is enabled by using an existing traversability estimation stack to generate training data in hindsight from real-world off-road datasets. RoadRunner improves system latency by a factor of ~4, reducing it from 500 ms to 140 ms, while improving accuracy for traversability costs and elevation map predictions. The framework is tested in multiple real-world driving scenarios through unstructured desert environments, demonstrating its effectiveness in enabling safe and reliable high-speed off-road navigation. RoadRunner's key contributions include a novel network architecture that predicts traversability costs and elevation maps directly from multi-LiDAR and multi-camera data at low latency, a general framework for generating pseudo-ground truth elevation maps and traversability costs using temporal data aggregation in hindsight, an overview of NASA's X-Racer off-road autonomy research stack, and an exhaustive evaluation of the RoadRunner architecture on real-world data. RoadRunner outperforms X-Racer in elevation mapping and traversability estimation by leveraging visual and geometric data, and can predict missing elevation and traversability information based on learned context. It also detects obstacles at longer ranges compared to X-Racer, improving traversability cost estimation by 52.3% in MSE and 36.0% in MAE for elevation map estimation while reducing perception-to-traversability latency by a factor of ~4. RoadRunner uses self-supervision, leveraging the X-Racer software stack to generate pseudo ground truth labels for training. It fuses information from past and future measurements to obtain reliable traversability and elevation estimates. The framework uses a combination of camera and LiDAR data, with features extracted from camera images using EfficientNet-B0 and point cloud features from LiDAR data. The network architecture includes a lifting process to project features from the camera image plane into 3D space, followed by multi-modal fusion of visual and geometric information. The network is trained using a weighted mean squared error loss function, with weights calculated based on the normalized frequency of elevation and traversability scores. The final optimization objective combines the losses for traversability and elevation estimation. RoadRunner is implemented on an Nvidia RTX3090 GPU with a batch size of 6, and accounts for 24.0 M parameters. The framework is evaluated on a dataset of 16.5 km of off-road driving data collected at Paso Rob
Reach us at info@study.space