GOAT-Bench: A Benchmark for Multi-Modal Lifelong Navigation

GOAT-Bench: A Benchmark for Multi-Modal Lifelong Navigation

9 Apr 2024 | Mukul Khanna1*, Ram Ramrakhya1*, Gunjan Chhablani1 Sriram Yenamandra1 Theophile Gervet2 Matthew Chang3 Zsolt Kira1 Devendra Singh Chaplot4 Dhruv Batra1 Roozbeh Mottaghi5 1Georgia Institute of Technology 2Carnegie Mellon University 3University of Illinois Urbana-Champaign 4Mistral AI 5University of Washington
GOAT-Bench is a benchmark designed to evaluate multi-modal lifelong navigation systems. The benchmark, referred to as the "Go to Any Thing" (GOAT) task, involves agents navigating to a sequence of open-vocabulary goals specified through category names, language descriptions, or images. The goal is to facilitate the development of universal, multi-modal, and lifelong navigation agents that can handle various goal types and leverage past experiences in the environment. The benchmark includes 181 HM3DSem scenes, 312 object categories, and 680k episodes. It features two key characteristics: open vocabulary and lifelong learning. The open vocabulary aspect allows for a broad range of targets, including those not encountered during training. The lifelong learning aspect ensures that each episode consists of 5 to 10 targets specified through distinct modalities, providing a realistic scenario where agents must remember and use past experiences. The benchmark evaluates two classes of methods: modular and end-to-end trained (RL) approaches. Modular methods use semantic mapping and planning, while RL methods use sensor-to-action neural networks. The evaluation focuses on the success rate and Success Weighted by Path Length (SPL), measuring efficiency. Results show that modular methods generally outperform RL methods, particularly in terms of SPL, due to their ability to leverage implicit or explicit memory representations. The paper also analyzes the performance of these methods across different modalities, the importance of memory for efficient navigation, and their robustness to noise in goal specifications. The findings highlight the need for effective memory representations and the limitations of current methods in handling certain types of goals, such as image and language goals. Overall, GOAT-Bench provides a comprehensive framework for researchers to develop and evaluate multi-modal lifelong navigation systems, contributing to the advancement of embodied AI in real-world applications.GOAT-Bench is a benchmark designed to evaluate multi-modal lifelong navigation systems. The benchmark, referred to as the "Go to Any Thing" (GOAT) task, involves agents navigating to a sequence of open-vocabulary goals specified through category names, language descriptions, or images. The goal is to facilitate the development of universal, multi-modal, and lifelong navigation agents that can handle various goal types and leverage past experiences in the environment. The benchmark includes 181 HM3DSem scenes, 312 object categories, and 680k episodes. It features two key characteristics: open vocabulary and lifelong learning. The open vocabulary aspect allows for a broad range of targets, including those not encountered during training. The lifelong learning aspect ensures that each episode consists of 5 to 10 targets specified through distinct modalities, providing a realistic scenario where agents must remember and use past experiences. The benchmark evaluates two classes of methods: modular and end-to-end trained (RL) approaches. Modular methods use semantic mapping and planning, while RL methods use sensor-to-action neural networks. The evaluation focuses on the success rate and Success Weighted by Path Length (SPL), measuring efficiency. Results show that modular methods generally outperform RL methods, particularly in terms of SPL, due to their ability to leverage implicit or explicit memory representations. The paper also analyzes the performance of these methods across different modalities, the importance of memory for efficient navigation, and their robustness to noise in goal specifications. The findings highlight the need for effective memory representations and the limitations of current methods in handling certain types of goals, such as image and language goals. Overall, GOAT-Bench provides a comprehensive framework for researchers to develop and evaluate multi-modal lifelong navigation systems, contributing to the advancement of embodied AI in real-world applications.
Reach us at info@study.space