RAG-Driver: Generalisable Driving Explanations with Retrieval-Augmented In-Context Learning in Multi-Modal Large Language Model

RAG-Driver: Generalisable Driving Explanations with Retrieval-Augmented In-Context Learning in Multi-Modal Large Language Model

29 May 2024 | Jianhao Yuan1, Shuyang Sun1, Daniel Omeiza1, Bo Zhao2, Paul Newman1, Lars Kunze1, Matthew Gadd1
RAG-Driver is a novel retrieval-augmented multi-modal large language model designed for generalisable and explainable autonomous driving. The core contribution is a retrieval mechanism that searches for similar driving scenarios to the current condition, enhancing the model's predictions through in-context learning (ICL). This approach improves the model's overall description and prediction accuracy, making it more generalisable to new deployment domains. The system outputs natural language descriptions of driving actions, justifications, and numerical control signals (speed and steering angle) based on driving videos. The model achieves state-of-the-art performance in driving action explanation and justification tasks, as measured by CIDEr, and demonstrates exceptional zero-shot generalisation to unseen scenarios without additional training. The method addresses challenges such as data scarcity, domain gaps, and catastrophic forgetting, making it a promising solution for trustworthy and transparent autonomous driving systems.RAG-Driver is a novel retrieval-augmented multi-modal large language model designed for generalisable and explainable autonomous driving. The core contribution is a retrieval mechanism that searches for similar driving scenarios to the current condition, enhancing the model's predictions through in-context learning (ICL). This approach improves the model's overall description and prediction accuracy, making it more generalisable to new deployment domains. The system outputs natural language descriptions of driving actions, justifications, and numerical control signals (speed and steering angle) based on driving videos. The model achieves state-of-the-art performance in driving action explanation and justification tasks, as measured by CIDEr, and demonstrates exceptional zero-shot generalisation to unseen scenarios without additional training. The method addresses challenges such as data scarcity, domain gaps, and catastrophic forgetting, making it a promising solution for trustworthy and transparent autonomous driving systems.
Reach us at info@study.space