Reinforcement Learning for Collision-free Flight Exploiting Deep Collision Encoding

Reinforcement Learning for Collision-free Flight Exploiting Deep Collision Encoding

6 Feb 2024 | Mihir Kulkarni and Kostas Alexis
This paper presents a novel deep reinforcement learning (DRL) navigation policy for collision-free flight of aerial robots, which uses a deep collision encoder (DCE) and reinforcement learning. The DCE compresses high-dimensional depth images into a low-dimensional latent space that encodes collision information, while accounting for the robot's size. This latent space is combined with the robot's odometry and target location to train a DRL policy that enables safe navigation in cluttered environments with low-latency computation and robust sim-to-real performance. The method is evaluated in both simulation and real-world experiments, demonstrating its efficiency and resilience in diverse environments. The DCE is trained on both simulated and real depth images using supervised learning, allowing it to compress depth data into a latent space that retains collision information. This modular approach reduces the sim-to-real gap by enabling separate verification of the latent space and assimilation of real-world data. The DRL policy is trained using a partially observable Markov decision process (POMDP) framework, with a reward function that encourages safe navigation by rewarding successful navigation and penalizing collisions. The policy is trained in a simulated environment with a curriculum learning approach, gradually increasing the complexity of the environment. The DRL policy is implemented on a real aerial robot, the Learning-based Micro Flyer (LMF), and tested in cluttered environments. The robot is able to navigate safely through cluttered settings, either flying above obstacles when possible or maneuvering around them when necessary. The DCE is robust to sensor noise and imperfect depth data, and the policy performs well even in environments with novel obstacles. The method achieves a low inference time of 15 ms on the NVIDIA Orin NX board, demonstrating its efficiency and real-time capability. The proposed approach is effective in both simulation and real-world experiments, showing robustness to sensor noise and varying robot dynamics. The DCE's low-dimensional encoding of collision information and the modular architecture contribute to the method's strong sim-to-real transfer capability. The results demonstrate that the proposed DRL policy enables safe and efficient collision-free flight of aerial robots in cluttered environments.This paper presents a novel deep reinforcement learning (DRL) navigation policy for collision-free flight of aerial robots, which uses a deep collision encoder (DCE) and reinforcement learning. The DCE compresses high-dimensional depth images into a low-dimensional latent space that encodes collision information, while accounting for the robot's size. This latent space is combined with the robot's odometry and target location to train a DRL policy that enables safe navigation in cluttered environments with low-latency computation and robust sim-to-real performance. The method is evaluated in both simulation and real-world experiments, demonstrating its efficiency and resilience in diverse environments. The DCE is trained on both simulated and real depth images using supervised learning, allowing it to compress depth data into a latent space that retains collision information. This modular approach reduces the sim-to-real gap by enabling separate verification of the latent space and assimilation of real-world data. The DRL policy is trained using a partially observable Markov decision process (POMDP) framework, with a reward function that encourages safe navigation by rewarding successful navigation and penalizing collisions. The policy is trained in a simulated environment with a curriculum learning approach, gradually increasing the complexity of the environment. The DRL policy is implemented on a real aerial robot, the Learning-based Micro Flyer (LMF), and tested in cluttered environments. The robot is able to navigate safely through cluttered settings, either flying above obstacles when possible or maneuvering around them when necessary. The DCE is robust to sensor noise and imperfect depth data, and the policy performs well even in environments with novel obstacles. The method achieves a low inference time of 15 ms on the NVIDIA Orin NX board, demonstrating its efficiency and real-time capability. The proposed approach is effective in both simulation and real-world experiments, showing robustness to sensor noise and varying robot dynamics. The DCE's low-dimensional encoding of collision information and the modular architecture contribute to the method's strong sim-to-real transfer capability. The results demonstrate that the proposed DRL policy enables safe and efficient collision-free flight of aerial robots in cluttered environments.
Reach us at info@study.space