Pack of LLMs: Model Fusion at Test-Time via Perplexity Optimization

Pack of LLMs: Model Fusion at Test-Time via Perplexity Optimization

17 Apr 2024 | Costas Mavromatis, Petros Karypis, George Karypis
This paper introduces Pack of LLMs (PackLLM), a test-time fusion method that combines knowledge from multiple Large Language Models (LLMs) to improve performance on various tasks. PackLLM leverages the expertise of each LLM based on their understanding of the input prompt, using perplexity as a measure of LLM expertise. The method performs model fusion by solving an optimization problem to determine the importance of each LLM, minimizing perplexity over the input prompt. Two variants of PackLLM are presented: PackLLM_sim, which uses a simple perplexity-based weighting approach, and PackLLM_opt, which approximately solves the perplexity minimization problem via a greedy algorithm. Experiments with over 100 LLMs on a diverse set of tasks show that PackLLM outperforms existing test-time fusion baselines by 1.89% accuracy points and can leverage new LLMs to improve performance over learning-based fusion approaches by 3.92–11.94% accuracy points. PackLLM is effective in both language modeling and downstream tasks, including knowledge-intensive tasks, commonsense reasoning, and domain-specific knowledge. The method is modular, allowing for the inclusion of new LLMs without requiring retraining. PackLLM also performs well in scenarios with varying input prompt lengths and is efficient in terms of computational cost. The results demonstrate that perplexity is a reliable measure for LLM fusion, and PackLLM provides a robust and effective approach for combining knowledge from multiple LLMs at test-time.This paper introduces Pack of LLMs (PackLLM), a test-time fusion method that combines knowledge from multiple Large Language Models (LLMs) to improve performance on various tasks. PackLLM leverages the expertise of each LLM based on their understanding of the input prompt, using perplexity as a measure of LLM expertise. The method performs model fusion by solving an optimization problem to determine the importance of each LLM, minimizing perplexity over the input prompt. Two variants of PackLLM are presented: PackLLM_sim, which uses a simple perplexity-based weighting approach, and PackLLM_opt, which approximately solves the perplexity minimization problem via a greedy algorithm. Experiments with over 100 LLMs on a diverse set of tasks show that PackLLM outperforms existing test-time fusion baselines by 1.89% accuracy points and can leverage new LLMs to improve performance over learning-based fusion approaches by 3.92–11.94% accuracy points. PackLLM is effective in both language modeling and downstream tasks, including knowledge-intensive tasks, commonsense reasoning, and domain-specific knowledge. The method is modular, allowing for the inclusion of new LLMs without requiring retraining. PackLLM also performs well in scenarios with varying input prompt lengths and is efficient in terms of computational cost. The results demonstrate that perplexity is a reliable measure for LLM fusion, and PackLLM provides a robust and effective approach for combining knowledge from multiple LLMs at test-time.
Reach us at info@study.space