This paper presents a novel deep navigation policy for aerial robots to achieve collision-free flight, leveraging a modular approach that combines deep collision encoding and reinforcement learning. The proposed solution uses a Deep Collision Encoder (DCE) to compress high-dimensional depth images into a low-dimensional latent space, retaining collision information while accounting for the robot's size. This compressed encoding is then combined with the robot's odometry and target location to train a deep reinforcement learning (DRL) navigation policy. The DRL policy is designed to offer low-latency computation and robust performance in real-world deployments. The method is evaluated through simulation and experimental studies in diverse environments, demonstrating its efficiency and resilience. Key contributions include a modular architecture that reduces the sim2real gap, a DRL policy for versatile tasks, and verified performance in both simulation and real-world experiments. The approach ensures low-latency inference and robust handling of sensor noise and novel obstacles, making it suitable for complex and dynamic environments.This paper presents a novel deep navigation policy for aerial robots to achieve collision-free flight, leveraging a modular approach that combines deep collision encoding and reinforcement learning. The proposed solution uses a Deep Collision Encoder (DCE) to compress high-dimensional depth images into a low-dimensional latent space, retaining collision information while accounting for the robot's size. This compressed encoding is then combined with the robot's odometry and target location to train a deep reinforcement learning (DRL) navigation policy. The DRL policy is designed to offer low-latency computation and robust performance in real-world deployments. The method is evaluated through simulation and experimental studies in diverse environments, demonstrating its efficiency and resilience. Key contributions include a modular architecture that reduces the sim2real gap, a DRL policy for versatile tasks, and verified performance in both simulation and real-world experiments. The approach ensures low-latency inference and robust handling of sensor noise and novel obstacles, making it suitable for complex and dynamic environments.