Vision-and-Language Navigation via Causal Learning

Vision-and-Language Navigation via Causal Learning

16 Apr 2024 | Liuyi Wang, Zongtao He, Ronghao Dang, Mengjiao Shen, Chengju Liu, Qijun Chen*
This paper introduces GOAT, a novel approach for vision-and-language navigation (VLN) that addresses dataset bias through causal learning. The method employs back-door and front-door adjustment causal learning (BACL and FACL) modules to mitigate observable and unobservable confounders, respectively. Additionally, a cross-modal feature pooling (CFP) module is introduced to enhance cross-modal representations and improve generalization. The proposed framework is validated across multiple VLN datasets (R2R, REVERIE, RxR, and SOON), demonstrating superior performance compared to existing state-of-the-art methods. The causal learning pipeline enables unbiased feature learning and decision-making by integrating causal inference principles into the model. The experiments show that GOAT significantly improves navigation accuracy and generalization capabilities, highlighting the effectiveness of causal learning in enhancing VLN systems. The method also provides a systematic approach for extracting coherent representations from sequence inputs, offering valuable insights for future research in similar tasks.This paper introduces GOAT, a novel approach for vision-and-language navigation (VLN) that addresses dataset bias through causal learning. The method employs back-door and front-door adjustment causal learning (BACL and FACL) modules to mitigate observable and unobservable confounders, respectively. Additionally, a cross-modal feature pooling (CFP) module is introduced to enhance cross-modal representations and improve generalization. The proposed framework is validated across multiple VLN datasets (R2R, REVERIE, RxR, and SOON), demonstrating superior performance compared to existing state-of-the-art methods. The causal learning pipeline enables unbiased feature learning and decision-making by integrating causal inference principles into the model. The experiments show that GOAT significantly improves navigation accuracy and generalization capabilities, highlighting the effectiveness of causal learning in enhancing VLN systems. The method also provides a systematic approach for extracting coherent representations from sequence inputs, offering valuable insights for future research in similar tasks.
Reach us at info@study.space
Understanding Vision-and-Language Navigation via Causal Learning