[slides] DOPRA%3A Decoding Over-accumulation Penalization and Re-allocation in Specific Weighting Layer

**DOPRA: Decoding Over-accumulation Penalization and Re-allocation in Specific Weighting Layer** This paper introduces DOPRA, a novel approach to mitigate hallucinations in multi-modal large language models (MLLMs). Unlike existing methods that often require costly supplementary training data or external knowledge sources, DOPRA addresses hallucinations by decoding specific weighted layer penalties and redistribution, offering an economical and effective solution. DOPRA is grounded in insights into the intrinsic mechanisms controlling hallucinations within MLLMs, particularly the models' tendency to over-rely on a subset of summary tokens in the self-attention matrix, neglecting critical image-related information. To counteract this over-reliance, DOPRA employs a strategy of weighted overlay penalties and redistribution in specific layers, such as the 12th layer, during the decoding process. Additionally, DOPRA includes a retrospective allocation process that re-examines the sequence of generated tokens, allowing the algorithm to reallocate token selection to better align with the actual image content, thereby reducing the incidence of hallucinatory descriptions in auto-generated captions. **Key Contributions:** - DOPRA presents an innovative solution that addresses hallucination issues in MLLMs during inference without requiring external data, knowledge repositories, or additional training procedures. - Through meticulous examination, DOPRA identifies the critical role played by summary tokens in the formation of hallucinations and develops a penalty-based decoding technique augmented with a backtracking reallocation strategy to detect excessive accumulation dynamics. - Comprehensive evaluations demonstrate DOPRA's superior performance, proving it to be a practically cost-free intervention that effectively mitigates hallucinations in multimodal language models, thereby enhancing the credibility and reliability of these powerful AI tools in real-world applications. **Experiments:** - DOPRA is evaluated using the Caption Hallucination Assessment with Image Relevance (CHAIR) and Polling-based Object Probing Evaluation (POPE) metrics on the MSCOCO dataset. - Results show that DOPRA demonstrates a clear superiority over baseline decoding methods in both C_S and C_I metrics, achieving the best performance in mitigating hallucinations, especially in longer sequences. **Discussion and Limitations:** - The illusions generated by LLMs may result from inadequate generalization or the model's knowledge not being updated in a timely manner. - DOPRA is a temporary solution, and future research may explore end-to-end fine-tuning and improvements in perceptual capabilities. - Future work could focus on more meticulous data alignment procedures and incorporating fine-grained visual features to overcome the perceptual limitations of current architectures. **Conclusion:** DOPRA introduces a novel and cost-effective approach to address the prevalent issue of hallucination in MLLMs, enhancing their precision and reliability for practical applications. By innovating a method that penalizes decoding over-accumulation and reallocates attention in specific weighting layers, DOPRA circumvents the need for additional resources, setting it apart from existing**DOPRA: Decoding Over-accumulation Penalization and Re-allocation in Specific Weighting Layer** This paper introduces DOPRA, a novel approach to mitigate hallucinations in multi-modal large language models (MLLMs). Unlike existing methods that often require costly supplementary training data or external knowledge sources, DOPRA addresses hallucinations by decoding specific weighted layer penalties and redistribution, offering an economical and effective solution. DOPRA is grounded in insights into the intrinsic mechanisms controlling hallucinations within MLLMs, particularly the models' tendency to over-rely on a subset of summary tokens in the self-attention matrix, neglecting critical image-related information. To counteract this over-reliance, DOPRA employs a strategy of weighted overlay penalties and redistribution in specific layers, such as the 12th layer, during the decoding process. Additionally, DOPRA includes a retrospective allocation process that re-examines the sequence of generated tokens, allowing the algorithm to reallocate token selection to better align with the actual image content, thereby reducing the incidence of hallucinatory descriptions in auto-generated captions. **Key Contributions:** - DOPRA presents an innovative solution that addresses hallucination issues in MLLMs during inference without requiring external data, knowledge repositories, or additional training procedures. - Through meticulous examination, DOPRA identifies the critical role played by summary tokens in the formation of hallucinations and develops a penalty-based decoding technique augmented with a backtracking reallocation strategy to detect excessive accumulation dynamics. - Comprehensive evaluations demonstrate DOPRA's superior performance, proving it to be a practically cost-free intervention that effectively mitigates hallucinations in multimodal language models, thereby enhancing the credibility and reliability of these powerful AI tools in real-world applications. **Experiments:** - DOPRA is evaluated using the Caption Hallucination Assessment with Image Relevance (CHAIR) and Polling-based Object Probing Evaluation (POPE) metrics on the MSCOCO dataset. - Results show that DOPRA demonstrates a clear superiority over baseline decoding methods in both C_S and C_I metrics, achieving the best performance in mitigating hallucinations, especially in longer sequences. **Discussion and Limitations:** - The illusions generated by LLMs may result from inadequate generalization or the model's knowledge not being updated in a timely manner. - DOPRA is a temporary solution, and future research may explore end-to-end fine-tuning and improvements in perceptual capabilities. - Future work could focus on more meticulous data alignment procedures and incorporating fine-grained visual features to overcome the perceptual limitations of current architectures. **Conclusion:** DOPRA introduces a novel and cost-effective approach to address the prevalent issue of hallucination in MLLMs, enhancing their precision and reliability for practical applications. By innovating a method that penalizes decoding over-accumulation and reallocates attention in specific weighting layers, DOPRA circumvents the need for additional resources, setting it apart from existing

DOPRA: Decoding Over-accumulation Penalization and Re-allocation in Specific Weighting Layer

October 28-November 1, 2024, Melbourne, VIC, Australia | Jinfeng Wei* and Xiaofeng Zhang*