Pack of LLMs: Model Fusion at Test-Time via Perplexity Optimization

Pack of LLMs: Model Fusion at Test-Time via Perplexity Optimization

17 Apr 2024 | Costas Mavromatis, Petros Karypis, George Karypis
The paper introduces Pack of LLMs (PackLLM), a test-time fusion method for combining knowledge from multiple Large Language Models (LLMs). PackLLM leverages each LLM's expertise by solving an optimization problem to determine the importance weights of the LLMs, minimizing perplexity over the input prompt. The method does not require training fusion models and can incorporate new LLMs during inference. The authors validate that perplexity is a reliable measure for LLM fusion and demonstrate that PackLLM outperforms existing test-time fusion baselines by 1.89% accuracy points. Additionally, PackLLM can leverage new LLMs to improve performance over learning-based fusion approaches by 3.92–11.94% accuracy points. Experiments are conducted on over 100 LLMs across various tasks, including language modeling and downstream tasks, showing the effectiveness and scalability of PackLLM.The paper introduces Pack of LLMs (PackLLM), a test-time fusion method for combining knowledge from multiple Large Language Models (LLMs). PackLLM leverages each LLM's expertise by solving an optimization problem to determine the importance weights of the LLMs, minimizing perplexity over the input prompt. The method does not require training fusion models and can incorporate new LLMs during inference. The authors validate that perplexity is a reliable measure for LLM fusion and demonstrate that PackLLM outperforms existing test-time fusion baselines by 1.89% accuracy points. Additionally, PackLLM can leverage new LLMs to improve performance over learning-based fusion approaches by 3.92–11.94% accuracy points. Experiments are conducted on over 100 LLMs across various tasks, including language modeling and downstream tasks, showing the effectiveness and scalability of PackLLM.
Reach us at info@study.space
[slides and audio] Pack of LLMs%3A Model Fusion at Test-Time via Perplexity Optimization