25 Jul 2024 | Michael Hassid, Tal Remez, Jonas Gehring, Roy Schwartz, Yossi Adi
The paper "The Larger the Better? Improved LLM Code-Generation via Budget Reallocation" explores the effectiveness of using smaller language models (LLMs) compared to larger ones when operating under the same computational budget. The authors analyze code generation tasks and compare the performance of various LLM sizes, such as running a 70B model once versus generating five outputs from a 13B model. They use a standard unit-test setup to select the best output from the smaller model. The findings reveal that repeated use of smaller models can yield consistent improvements, with gains of up to 15% across five tasks. However, in scenarios without unit-tests, a ranking-based selection of candidates from the smaller model falls short of the performance of a single output from larger models. The study highlights the potential of using smaller models instead of larger ones and emphasizes the importance of developing effective ranking approaches for LLM outputs. The authors also release 2,000 Code Llama 7B outputs for each example in HumanEval and MBPP to support further research.The paper "The Larger the Better? Improved LLM Code-Generation via Budget Reallocation" explores the effectiveness of using smaller language models (LLMs) compared to larger ones when operating under the same computational budget. The authors analyze code generation tasks and compare the performance of various LLM sizes, such as running a 70B model once versus generating five outputs from a 13B model. They use a standard unit-test setup to select the best output from the smaller model. The findings reveal that repeated use of smaller models can yield consistent improvements, with gains of up to 15% across five tasks. However, in scenarios without unit-tests, a ranking-based selection of candidates from the smaller model falls short of the performance of a single output from larger models. The study highlights the potential of using smaller models instead of larger ones and emphasizes the importance of developing effective ranking approaches for LLM outputs. The authors also release 2,000 Code Llama 7B outputs for each example in HumanEval and MBPP to support further research.