Understanding RLAIF-V%3A Open-Source AI Feedback Leads to Super GPT-4V Trustworthiness

RLAIF-V is a novel framework designed to align multimodal large language models (MLLMs) with human preferences through open-source AI feedback, aiming to enhance the trustworthiness of these models. Traditional methods rely on manual labeling, which is labor-intensive and time-consuming, while recent approaches using models as automatic labelers have shown promise but often depend on costly proprietary models like GPT-4V, leading to scalability issues. RLAIF-V addresses these challenges by leveraging high-quality feedback data and an online feedback learning algorithm. It introduces a deconfounded candidate response generation strategy and a divide-and-conquer approach to improve the accuracy and efficiency of pairwise feedback data. The framework also employs an iterative alignment method to mitigate distribution shift problems, improving both learning performance and efficiency. Extensive experiments on seven benchmarks demonstrate that RLAIF-V significantly reduces hallucination rates without sacrificing performance on other tasks, outperforming GPT-4V in trustworthiness. The results highlight the potential of open-source MLLMs to achieve self-alignment, with a 12B model achieving a hallucination rate of less than 29.5%, surpassing GPT-4V by a large margin. The framework's effectiveness is further validated through qualitative analysis and comparisons with other methods, showing its potential to enhance the trustworthiness of leading-edge MLLMs.RLAIF-V is a novel framework designed to align multimodal large language models (MLLMs) with human preferences through open-source AI feedback, aiming to enhance the trustworthiness of these models. Traditional methods rely on manual labeling, which is labor-intensive and time-consuming, while recent approaches using models as automatic labelers have shown promise but often depend on costly proprietary models like GPT-4V, leading to scalability issues. RLAIF-V addresses these challenges by leveraging high-quality feedback data and an online feedback learning algorithm. It introduces a deconfounded candidate response generation strategy and a divide-and-conquer approach to improve the accuracy and efficiency of pairwise feedback data. The framework also employs an iterative alignment method to mitigate distribution shift problems, improving both learning performance and efficiency. Extensive experiments on seven benchmarks demonstrate that RLAIF-V significantly reduces hallucination rates without sacrificing performance on other tasks, outperforming GPT-4V in trustworthiness. The results highlight the potential of open-source MLLMs to achieve self-alignment, with a 12B model achieving a hallucination rate of less than 29.5%, surpassing GPT-4V by a large margin. The framework's effectiveness is further validated through qualitative analysis and comparisons with other methods, showing its potential to enhance the trustworthiness of leading-edge MLLMs.

RLAIF-V: Aligning MLLMs through Open-Source AI Feedback for Super GPT-4V Trustworthiness

27 May 2024 | Tianyu Yu, Haoye Zhang, Yuan Yao, Yunkai Dang, Da Chen, Xiaoman Lu, Ganqu Cui, Taiwen He, Zhiyuan Liu, Tat-Seng Chua, Maosong Sun