[slides and audio] Demonstrating Agile Flight from Pixels without State Estimation

This paper presents a novel vision-based quadrotor system that can navigate through a sequence of gates at high speeds (up to 40 km/h) and accelerations (up to 2 g) without explicit state estimation. The system leverages a first-person-view (FPV) video stream from the drone's onboard camera, similar to how professional drone pilots control their drones. The key contributions include: 1. **Pixel-Based Control**: The system directly maps pixels to control commands (collective thrust and body rates), enabling agile flight without the need for specialized hardware or onboard state estimation. 2. **Reinforcement Learning (RL)**: The control policies are trained using reinforcement learning, specifically PPO, with an asymmetric actor-critic framework that includes privileged information about the drone's state. 3. **Efficient Training**: The inner gate edges are used as a task-relevant abstraction, allowing for efficient simulation during training without the need for high-quality image rendering. 4. **Robust Gate Detection**: A Swin-transformer-based gate detector is used to robustly detect gate edges in real-world images, achieving high accuracy and robustness. The system demonstrates successful autonomous flight in both simulated and real-world environments, achieving a 100% success rate in hardware-in-the-loop experiments. The results show that the pixel-based approach can achieve performance comparable to state-based policies, with only minor differences in lap times and gate-passing errors. The method has broader implications for applications such as indoor navigation and inspection tasks, where salient landmarks can be used for guidance.This paper presents a novel vision-based quadrotor system that can navigate through a sequence of gates at high speeds (up to 40 km/h) and accelerations (up to 2 g) without explicit state estimation. The system leverages a first-person-view (FPV) video stream from the drone's onboard camera, similar to how professional drone pilots control their drones. The key contributions include: 1. **Pixel-Based Control**: The system directly maps pixels to control commands (collective thrust and body rates), enabling agile flight without the need for specialized hardware or onboard state estimation. 2. **Reinforcement Learning (RL)**: The control policies are trained using reinforcement learning, specifically PPO, with an asymmetric actor-critic framework that includes privileged information about the drone's state. 3. **Efficient Training**: The inner gate edges are used as a task-relevant abstraction, allowing for efficient simulation during training without the need for high-quality image rendering. 4. **Robust Gate Detection**: A Swin-transformer-based gate detector is used to robustly detect gate edges in real-world images, achieving high accuracy and robustness. The system demonstrates successful autonomous flight in both simulated and real-world environments, achieving a 100% success rate in hardware-in-the-loop experiments. The results show that the pixel-based approach can achieve performance comparable to state-based policies, with only minor differences in lap times and gate-passing errors. The method has broader implications for applications such as indoor navigation and inspection tasks, where salient landmarks can be used for guidance.

Demonstrating Agile Flight from Pixels without State Estimation

2024 | Ismail Geles*, Leonard Bauersfeld*, Angel Romero, Jiaxu Xing, Davide Scaramuzza

2024 | Ismail Geles, Leonard Bauersfeld, Angel Romero, Jiaxu Xing, Davide Scaramuzza