RLAIF-V: Aligning MLLMs through Open-Source AI Feedback for Super GPT-4V Trustworthiness

RLAIF-V: Aligning MLLMs through Open-Source AI Feedback for Super GPT-4V Trustworthiness

27 May 2024 | Tianyu Yu, Haoye Zhang, Yuan Yao, Yunkai Dang, Da Chen, Xiaoman Lu, Ganqu Cui, Taiwen He, Zhiyuan Liu, Tat-Seng Chua, Maosong Sun
RLAIF-V is a novel framework that aligns multimodal large language models (MLLMs) using open-source AI feedback to enhance their trustworthiness, surpassing even proprietary models like GPT-4V. The framework improves feedback quality through a deconfounded candidate response generation strategy and a divide-and-conquer approach for response evaluation. It also employs an iterative alignment method to mitigate distribution shift issues, enhancing learning efficiency and performance. Extensive experiments on seven benchmarks show that RLAIF-V significantly reduces hallucination rates without compromising performance on other tasks. Using a 34B model as a labeler, RLAIF-V 7B reduces object hallucination by 82.9% and overall hallucination by 42.1%, outperforming the labeler model. Remarkably, a 12B model using RLAIF-V can learn from its own feedback, achieving a hallucination rate of less than 29.5%, surpassing GPT-4V by a large margin. The results demonstrate a promising path to enhance the efficacy of leading-edge MLLMs through self-alignment. RLAIF-V is compatible with other feedback sources and shows generalizability in improving the trustworthiness of different MLLMs. The framework's contributions include a novel deconfounded strategy, divide-and-conquer evaluation, iterative alignment, and comprehensive experimental validation. All code, data, and model weights are released for further research.RLAIF-V is a novel framework that aligns multimodal large language models (MLLMs) using open-source AI feedback to enhance their trustworthiness, surpassing even proprietary models like GPT-4V. The framework improves feedback quality through a deconfounded candidate response generation strategy and a divide-and-conquer approach for response evaluation. It also employs an iterative alignment method to mitigate distribution shift issues, enhancing learning efficiency and performance. Extensive experiments on seven benchmarks show that RLAIF-V significantly reduces hallucination rates without compromising performance on other tasks. Using a 34B model as a labeler, RLAIF-V 7B reduces object hallucination by 82.9% and overall hallucination by 42.1%, outperforming the labeler model. Remarkably, a 12B model using RLAIF-V can learn from its own feedback, achieving a hallucination rate of less than 29.5%, surpassing GPT-4V by a large margin. The results demonstrate a promising path to enhance the efficacy of leading-edge MLLMs through self-alignment. RLAIF-V is compatible with other feedback sources and shows generalizability in improving the trustworthiness of different MLLMs. The framework's contributions include a novel deconfounded strategy, divide-and-conquer evaluation, iterative alignment, and comprehensive experimental validation. All code, data, and model weights are released for further research.
Reach us at info@study.space
[slides] RLAIF-V%3A Open-Source AI Feedback Leads to Super GPT-4V Trustworthiness | StudySpace