MDPO: Conditional Preference Optimization for Multimodal Large Language Models

MDPO: Conditional Preference Optimization for Multimodal Large Language Models

17 Jun 2024 | Fei Wang1 Wenxuan Zhou1 James Y. Huang1 Nan Xu1 Sheng Zhang3 Hoifung Poon3 Muhao Chen2
The paper "MDPO: Conditional Preference Optimization for Multimodal Large Language Models" addresses the challenges of applying direct preference optimization (DPO) to multimodal scenarios, where the model often overlooks visual information. The authors identify the *unconditional preference* problem, where the model prioritizes language-only preferences over image preferences. To tackle this issue, they propose mDPO, a multimodal DPO objective that includes conditional preference optimization and anchored preference optimization. Conditional preference optimization ensures the model learns preferences based on both visual and language cues, while anchored preference optimization maintains the likelihood of chosen responses. Experiments on two multimodal LLMs (Bunny-v1.0-3B and LLaVA-v1.5-7B) and three benchmarks (MMHalBench, Object HalBench, and AMBER) demonstrate that mDPO significantly improves model performance, particularly in reducing hallucinations. The paper also provides detailed analyses and qualitative results to support its findings.The paper "MDPO: Conditional Preference Optimization for Multimodal Large Language Models" addresses the challenges of applying direct preference optimization (DPO) to multimodal scenarios, where the model often overlooks visual information. The authors identify the *unconditional preference* problem, where the model prioritizes language-only preferences over image preferences. To tackle this issue, they propose mDPO, a multimodal DPO objective that includes conditional preference optimization and anchored preference optimization. Conditional preference optimization ensures the model learns preferences based on both visual and language cues, while anchored preference optimization maintains the likelihood of chosen responses. Experiments on two multimodal LLMs (Bunny-v1.0-3B and LLaVA-v1.5-7B) and three benchmarks (MMHalBench, Object HalBench, and AMBER) demonstrate that mDPO significantly improves model performance, particularly in reducing hallucinations. The paper also provides detailed analyses and qualitative results to support its findings.
Reach us at info@study.space