Understanding GOAT-Bench%3A A Benchmark for Multi-Modal Lifelong Navigation

GOAT-Bench is a benchmark designed to evaluate multi-modal lifelong navigation systems. The benchmark, referred to as the "Go to Any Thing" (GOAT) task, involves agents navigating to a sequence of open-vocabulary goals specified through category names, language descriptions, or images. The goal is to facilitate the development of universal, multi-modal, and lifelong navigation agents that can handle various goal types and leverage past experiences in the environment. The benchmark includes 181 HM3DSem scenes, 312 object categories, and 680k episodes. It features two key characteristics: open vocabulary and lifelong learning. The open vocabulary aspect allows for a broad range of targets, including those not encountered during training. The lifelong learning aspect ensures that each episode consists of 5 to 10 targets specified through distinct modalities, providing a realistic scenario where agents must remember and use past experiences. The benchmark evaluates two classes of methods: modular and end-to-end trained (RL) approaches. Modular methods use semantic mapping and planning, while RL methods use sensor-to-action neural networks. The evaluation focuses on the success rate and Success Weighted by Path Length (SPL), measuring efficiency. Results show that modular methods generally outperform RL methods, particularly in terms of SPL, due to their ability to leverage implicit or explicit memory representations. The paper also analyzes the performance of these methods across different modalities, the importance of memory for efficient navigation, and their robustness to noise in goal specifications. The findings highlight the need for effective memory representations and the limitations of current methods in handling certain types of goals, such as image and language goals. Overall, GOAT-Bench provides a comprehensive framework for researchers to develop and evaluate multi-modal lifelong navigation systems, contributing to the advancement of embodied AI in real-world applications.GOAT-Bench is a benchmark designed to evaluate multi-modal lifelong navigation systems. The benchmark, referred to as the "Go to Any Thing" (GOAT) task, involves agents navigating to a sequence of open-vocabulary goals specified through category names, language descriptions, or images. The goal is to facilitate the development of universal, multi-modal, and lifelong navigation agents that can handle various goal types and leverage past experiences in the environment. The benchmark includes 181 HM3DSem scenes, 312 object categories, and 680k episodes. It features two key characteristics: open vocabulary and lifelong learning. The open vocabulary aspect allows for a broad range of targets, including those not encountered during training. The lifelong learning aspect ensures that each episode consists of 5 to 10 targets specified through distinct modalities, providing a realistic scenario where agents must remember and use past experiences. The benchmark evaluates two classes of methods: modular and end-to-end trained (RL) approaches. Modular methods use semantic mapping and planning, while RL methods use sensor-to-action neural networks. The evaluation focuses on the success rate and Success Weighted by Path Length (SPL), measuring efficiency. Results show that modular methods generally outperform RL methods, particularly in terms of SPL, due to their ability to leverage implicit or explicit memory representations. The paper also analyzes the performance of these methods across different modalities, the importance of memory for efficient navigation, and their robustness to noise in goal specifications. The findings highlight the need for effective memory representations and the limitations of current methods in handling certain types of goals, such as image and language goals. Overall, GOAT-Bench provides a comprehensive framework for researchers to develop and evaluate multi-modal lifelong navigation systems, contributing to the advancement of embodied AI in real-world applications.

GOAT-Bench: A Benchmark for Multi-Modal Lifelong Navigation