LLaMA PRO: Progressive LLaMA with Block Expansion

LLaMA PRO: Progressive LLaMA with Block Expansion

30 May 2024 | Chengyue Wu, Yukang Gan, Yixiao Ge, Zeyu Lu, Jiahao Wang, Ye Feng, Ying Shan, Ping Luo
The paper introduces a novel post-pretraining method for Large Language Models (LLMs) called *block expansion*, which enhances domain-specific capabilities while preserving general abilities. The method involves expanding the Transformer blocks of an off-the-shelf LLM by copying and zero-initializing identity blocks, followed by fine-tuning these new blocks on domain-specific corpora. This approach is applied to LLaMA2-7B, resulting in LLaMA PRO, a versatile foundation model with 8.3B parameters. LLaMA PRO excels in general tasks, programming, and mathematics, outperforming existing models in various benchmarks. The paper also presents LLAMA PRO - INSTRUCT, an instruction-following version of LLaMA PRO, which demonstrates superior performance in a wide range of tasks. The method is evaluated on extensive datasets, including traditional and agent-oriented tasks, showing its superior performance and potential in broader complex applications. The study highlights the effectiveness of block expansion in balancing general and domain-specific capabilities, making LLaMA PRO a promising model for various NLP tasks.The paper introduces a novel post-pretraining method for Large Language Models (LLMs) called *block expansion*, which enhances domain-specific capabilities while preserving general abilities. The method involves expanding the Transformer blocks of an off-the-shelf LLM by copying and zero-initializing identity blocks, followed by fine-tuning these new blocks on domain-specific corpora. This approach is applied to LLaMA2-7B, resulting in LLaMA PRO, a versatile foundation model with 8.3B parameters. LLaMA PRO excels in general tasks, programming, and mathematics, outperforming existing models in various benchmarks. The paper also presents LLAMA PRO - INSTRUCT, an instruction-following version of LLaMA PRO, which demonstrates superior performance in a wide range of tasks. The method is evaluated on extensive datasets, including traditional and agent-oriented tasks, showing its superior performance and potential in broader complex applications. The study highlights the effectiveness of block expansion in balancing general and domain-specific capabilities, making LLaMA PRO a promising model for various NLP tasks.
Reach us at info@study.space
Understanding LLaMA Pro%3A Progressive LLaMA with Block Expansion