This paper presents a novel vision-based quadrotor system that can navigate through a sequence of gates at high speeds (up to 40 km/h) and accelerations (up to 2 g) without explicit state estimation. The system leverages a first-person-view (FPV) video stream from the drone's onboard camera, similar to how professional drone pilots control their drones. The key contributions include:
1. **Pixel-Based Control**: The system directly maps pixels to control commands (collective thrust and body rates), enabling agile flight without the need for specialized hardware or onboard state estimation.
2. **Reinforcement Learning (RL)**: The control policies are trained using reinforcement learning, specifically PPO, with an asymmetric actor-critic framework that includes privileged information about the drone's state.
3. **Efficient Training**: The inner gate edges are used as a task-relevant abstraction, allowing for efficient simulation during training without the need for high-quality image rendering.
4. **Robust Gate Detection**: A Swin-transformer-based gate detector is used to robustly detect gate edges in real-world images, achieving high accuracy and robustness.
The system demonstrates successful autonomous flight in both simulated and real-world environments, achieving a 100% success rate in hardware-in-the-loop experiments. The results show that the pixel-based approach can achieve performance comparable to state-based policies, with only minor differences in lap times and gate-passing errors. The method has broader implications for applications such as indoor navigation and inspection tasks, where salient landmarks can be used for guidance.This paper presents a novel vision-based quadrotor system that can navigate through a sequence of gates at high speeds (up to 40 km/h) and accelerations (up to 2 g) without explicit state estimation. The system leverages a first-person-view (FPV) video stream from the drone's onboard camera, similar to how professional drone pilots control their drones. The key contributions include:
1. **Pixel-Based Control**: The system directly maps pixels to control commands (collective thrust and body rates), enabling agile flight without the need for specialized hardware or onboard state estimation.
2. **Reinforcement Learning (RL)**: The control policies are trained using reinforcement learning, specifically PPO, with an asymmetric actor-critic framework that includes privileged information about the drone's state.
3. **Efficient Training**: The inner gate edges are used as a task-relevant abstraction, allowing for efficient simulation during training without the need for high-quality image rendering.
4. **Robust Gate Detection**: A Swin-transformer-based gate detector is used to robustly detect gate edges in real-world images, achieving high accuracy and robustness.
The system demonstrates successful autonomous flight in both simulated and real-world environments, achieving a 100% success rate in hardware-in-the-loop experiments. The results show that the pixel-based approach can achieve performance comparable to state-based policies, with only minor differences in lap times and gate-passing errors. The method has broader implications for applications such as indoor navigation and inspection tasks, where salient landmarks can be used for guidance.