Understanding Bridging the Gap between Different Vocabularies for LLM Ensemble

The paper introduces a novel method called Ensemble via Vocabulary Alignment (EVA) to bridge the lexical gap between different large language models (LLMs) and enable fine-grained ensemble at each generation step. The primary challenge addressed is the vocabulary discrepancies among LLMs, which have limited previous ensemble methods to selecting or blending completely generated outputs. EVA leverages overlapping tokens to learn mappings between different LLMs' vocabularies, projecting output distributions into a unified space. This approach allows for dynamic correction and enhancement of outputs during generation, improving overall performance. The method is evaluated on various NLP tasks, including commonsense reasoning, arithmetic reasoning, machine translation, and data-to-text generation, demonstrating superior results compared to individual LLMs and previous ensemble methods. The paper also discusses the effectiveness of the filtering strategy in excluding unfaithful tokens and the impact of model filtering intensity and the number of ensemble models on performance.The paper introduces a novel method called Ensemble via Vocabulary Alignment (EVA) to bridge the lexical gap between different large language models (LLMs) and enable fine-grained ensemble at each generation step. The primary challenge addressed is the vocabulary discrepancies among LLMs, which have limited previous ensemble methods to selecting or blending completely generated outputs. EVA leverages overlapping tokens to learn mappings between different LLMs' vocabularies, projecting output distributions into a unified space. This approach allows for dynamic correction and enhancement of outputs during generation, improving overall performance. The method is evaluated on various NLP tasks, including commonsense reasoning, arithmetic reasoning, machine translation, and data-to-text generation, demonstrating superior results compared to individual LLMs and previous ensemble methods. The paper also discusses the effectiveness of the filtering strategy in excluding unfaithful tokens and the impact of model filtering intensity and the number of ensemble models on performance.

Bridging the Gap between Different Vocabularies for LLM Ensemble

15 Apr 2024 | Yangyifan Xu, Jinliang Lu, Jiajun Zhang