PoliFormer: Scaling On-Policy RL with Transformers Results in Masterful Navigators

PoliFormer: Scaling On-Policy RL with Transformers Results in Masterful Navigators

28 Jun 2024 | Kuo-Hao Zeng, Zichen Zhang, Kiana Ehsani, Rose Hendrix, Jordi Salvador, Alvaro Herrasti, Ross Girshick, Aniruddha Kembhavi, Luca Weihs
POLIFORMER is a transformer-based policy trained with on-policy reinforcement learning (RL) in simulation, achieving state-of-the-art results in indoor navigation across two robotic embodiments, LoCoBot and Stretch RE-1. It uses a foundational vision transformer encoder and a causal transformer decoder for long-term memory and reasoning. Trained on hundreds of millions of interactions across diverse environments, POLIFORMER outperforms previous methods in object goal navigation, achieving an 85.5% success rate on the CHORES-S benchmark, a 28.5% improvement over prior models. It also excels in other navigation benchmarks, including ProcTHOR, ArchitecTHOR, and AI2-iTHOR. POLIFORMER can be extended to various downstream tasks like object tracking and open-vocabulary navigation without fine-tuning. The model's success is attributed to three key design choices: scaling in architecture, rollouts, and diverse environment interactions. POLIFORMER-BOXNAV, a variant that uses bounding boxes as goal specifications, demonstrates zero-shot generalization to real-world tasks. The model's performance improves with increased training scale and is efficient in training and inference due to the use of KV-cache. POLIFORMER achieves significant results in both simulation and the real world, showing strong generalization and scalability.POLIFORMER is a transformer-based policy trained with on-policy reinforcement learning (RL) in simulation, achieving state-of-the-art results in indoor navigation across two robotic embodiments, LoCoBot and Stretch RE-1. It uses a foundational vision transformer encoder and a causal transformer decoder for long-term memory and reasoning. Trained on hundreds of millions of interactions across diverse environments, POLIFORMER outperforms previous methods in object goal navigation, achieving an 85.5% success rate on the CHORES-S benchmark, a 28.5% improvement over prior models. It also excels in other navigation benchmarks, including ProcTHOR, ArchitecTHOR, and AI2-iTHOR. POLIFORMER can be extended to various downstream tasks like object tracking and open-vocabulary navigation without fine-tuning. The model's success is attributed to three key design choices: scaling in architecture, rollouts, and diverse environment interactions. POLIFORMER-BOXNAV, a variant that uses bounding boxes as goal specifications, demonstrates zero-shot generalization to real-world tasks. The model's performance improves with increased training scale and is efficient in training and inference due to the use of KV-cache. POLIFORMER achieves significant results in both simulation and the real world, showing strong generalization and scalability.
Reach us at info@study.space
Understanding PoliFormer%3A Scaling On-Policy RL with Transformers Results in Masterful Navigators