This paper presents the first vision-based quadrotor system that autonomously navigates through a sequence of gates at high speeds without explicit state estimation. The system directly maps pixels from an onboard camera to control commands, using a reinforcement learning (RL) approach with an asymmetric actor-critic architecture. The RL policy is trained using privileged information, enabling efficient learning. The system leverages inner gate edges as a sensor abstraction, which can be simulated without rendering images. During deployment, a Swin-transformer-based gate detector is used to identify gates. The system achieves speeds up to 40 km/h and accelerations up to 2 g, demonstrating agile flight without requiring specialized hardware or IMU measurements. The approach enables autonomous agile flight with standard, off-the-shelf hardware and has potential applications beyond drone racing, such as in structured environments. The method uses a pixel-level abstraction for training, which is efficient and robust. The system is tested in simulation and real-world experiments, showing high success rates and performance comparable to state-based policies. The results highlight the effectiveness of the approach in achieving agile flight directly from pixels without explicit state estimation.This paper presents the first vision-based quadrotor system that autonomously navigates through a sequence of gates at high speeds without explicit state estimation. The system directly maps pixels from an onboard camera to control commands, using a reinforcement learning (RL) approach with an asymmetric actor-critic architecture. The RL policy is trained using privileged information, enabling efficient learning. The system leverages inner gate edges as a sensor abstraction, which can be simulated without rendering images. During deployment, a Swin-transformer-based gate detector is used to identify gates. The system achieves speeds up to 40 km/h and accelerations up to 2 g, demonstrating agile flight without requiring specialized hardware or IMU measurements. The approach enables autonomous agile flight with standard, off-the-shelf hardware and has potential applications beyond drone racing, such as in structured environments. The method uses a pixel-level abstraction for training, which is efficient and robust. The system is tested in simulation and real-world experiments, showing high success rates and performance comparable to state-based policies. The results highlight the effectiveness of the approach in achieving agile flight directly from pixels without explicit state estimation.