GOAT-Bench: A Benchmark for Multi-Modal Lifelong Navigation

GOAT-Bench: A Benchmark for Multi-Modal Lifelong Navigation

9 Apr 2024 | Mukul Khanna, Ram Ramrakhya, Gunjan Chhablani, Sriram Yenamandra, Theophile Gervet, Matthew Chang, Zsolt Kira, Devendra Singh Chaplot, Dhruv Batra, Roozbeh Mottaghi
GOAT-Bench is a benchmark for multi-modal lifelong navigation, designed to evaluate agents navigating to a sequence of goals specified through category names, language descriptions, or images. The benchmark includes 181 HM3DSem scenes, 312 object categories, and 680k episodes. It features open-vocabulary, multi-modal goals and lifelong learning, where each episode consists of 5-10 goals specified through different modalities. The benchmark compares two types of methods: modular learning methods and SenseAct-NN policies trained with and without memory. Modular methods, which use semantic maps, perform better in terms of efficiency (SPL) and robustness to noise, while SenseAct-NN methods achieve higher success rates but are less efficient. The results highlight the importance of effective memory representations for improving navigation efficiency. The benchmark also evaluates the performance of methods across different modalities and their robustness to noise in goal specifications. Overall, the benchmark provides a comprehensive analysis of multi-modal lifelong navigation methods and their effectiveness in handling various goal types.GOAT-Bench is a benchmark for multi-modal lifelong navigation, designed to evaluate agents navigating to a sequence of goals specified through category names, language descriptions, or images. The benchmark includes 181 HM3DSem scenes, 312 object categories, and 680k episodes. It features open-vocabulary, multi-modal goals and lifelong learning, where each episode consists of 5-10 goals specified through different modalities. The benchmark compares two types of methods: modular learning methods and SenseAct-NN policies trained with and without memory. Modular methods, which use semantic maps, perform better in terms of efficiency (SPL) and robustness to noise, while SenseAct-NN methods achieve higher success rates but are less efficient. The results highlight the importance of effective memory representations for improving navigation efficiency. The benchmark also evaluates the performance of methods across different modalities and their robustness to noise in goal specifications. Overall, the benchmark provides a comprehensive analysis of multi-modal lifelong navigation methods and their effectiveness in handling various goal types.
Reach us at info@study.space