[slides and audio] Target-driven visual navigation in indoor scenes using deep reinforcement learning

This paper presents a target-driven visual navigation model using deep reinforcement learning (DRL) for indoor scenes. The model addresses two key challenges: generalization to new targets and data efficiency. To improve generalization, the model incorporates the target goal into its policy, allowing it to navigate to new targets without retraining. To enhance data efficiency, the authors developed the AI2-THOR framework, which provides high-quality 3D scenes and a physics engine for training. The framework enables agents to interact with objects and collect large amounts of training data efficiently. The model is trained using a deep siamese actor-critic network that jointly embeds the target goal and the current state. This allows the model to learn a policy that conditions actions on both the current state and the target, enabling navigation to new targets without retraining. The model is end-to-end trainable and does not require feature engineering, feature matching, or 3D reconstruction of the environment. The model is evaluated on three tasks: target generalization (navigating to targets not used during training), scene generalization (navigating to targets in new scenes), and real-world generalization (navigating to targets using a real robot). The results show that the model outperforms state-of-the-art DRL methods in terms of data efficiency and generalization. The model is also able to navigate in both discrete and continuous spaces, and it can be adapted to real-world scenarios with minimal fine-tuning. The AI2-THOR framework is used to train and evaluate the model. It allows for the collection of large amounts of training data in a simulated environment, which is then used to train the model. The framework supports various types of indoor scenes and enables agents to interact with objects in a realistic way. The model is shown to be effective in both simulated and real-world environments. It is able to navigate to targets in new scenes and new targets without retraining, and it can be adapted to real-world scenarios with minimal fine-tuning. The model is end-to-end trainable and does not require explicit feature matching or 3D reconstruction of the environment. The results demonstrate that the model is capable of generalizing to new targets and scenes, and it is more data-efficient than state-of-the-art DRL methods.This paper presents a target-driven visual navigation model using deep reinforcement learning (DRL) for indoor scenes. The model addresses two key challenges: generalization to new targets and data efficiency. To improve generalization, the model incorporates the target goal into its policy, allowing it to navigate to new targets without retraining. To enhance data efficiency, the authors developed the AI2-THOR framework, which provides high-quality 3D scenes and a physics engine for training. The framework enables agents to interact with objects and collect large amounts of training data efficiently. The model is trained using a deep siamese actor-critic network that jointly embeds the target goal and the current state. This allows the model to learn a policy that conditions actions on both the current state and the target, enabling navigation to new targets without retraining. The model is end-to-end trainable and does not require feature engineering, feature matching, or 3D reconstruction of the environment. The model is evaluated on three tasks: target generalization (navigating to targets not used during training), scene generalization (navigating to targets in new scenes), and real-world generalization (navigating to targets using a real robot). The results show that the model outperforms state-of-the-art DRL methods in terms of data efficiency and generalization. The model is also able to navigate in both discrete and continuous spaces, and it can be adapted to real-world scenarios with minimal fine-tuning. The AI2-THOR framework is used to train and evaluate the model. It allows for the collection of large amounts of training data in a simulated environment, which is then used to train the model. The framework supports various types of indoor scenes and enables agents to interact with objects in a realistic way. The model is shown to be effective in both simulated and real-world environments. It is able to navigate to targets in new scenes and new targets without retraining, and it can be adapted to real-world scenarios with minimal fine-tuning. The model is end-to-end trainable and does not require explicit feature matching or 3D reconstruction of the environment. The results demonstrate that the model is capable of generalizing to new targets and scenes, and it is more data-efficient than state-of-the-art DRL methods.

Target-driven Visual Navigation in Indoor Scenes using Deep Reinforcement Learning

16 Sep 2016 | Yuke Zhu, Roozbeh Mottaghi, Eric Kolve, Joseph J. Lim, Abhinav Gupta, Li Fei-Fei, Ali Farhadi