TinyLlama: An Open-Source Small Language Model

TinyLlama: An Open-Source Small Language Model

4 Jun 2024 | Peiyan Zhang*, Guangtao Zeng*, Tianduo Wang, Wei Lu
TinyLlama is a compact 1.1B parameter language model pretrained on approximately 1 trillion tokens for up to 3 epochs. It is based on the architecture and tokenizer of Llama 2, and leverages advancements from the open-source community, such as FlashAttention and Lit-GPT, to achieve better computational efficiency. Despite its small size, TinyLlama performs well on various downstream tasks and outperforms existing open-source language models of similar size. The model is open-source and available on GitHub. The pre-training data for TinyLlama includes a mix of natural language and code data from SlimPajama and StarCoder. The model uses a decoder-only Transformer architecture with RoPE positional embeddings, pre-norm, RMSNorm, SwiGLU activation, and grouped-query attention. It also incorporates speed optimizations like FSDP and FlashAttention to improve training efficiency. TinyLlama v1.1 was trained with a three-stage pre-training process, including basic pre-training, continual pre-training on specific domains, and a cooldown phase. This approach allows the model to adapt to different tasks and domains. The model variants include TinyLlama v1.1, TinyLlama v1.1 Math&Code, and TinyLlama v1.1 Chinese, each tailored for specific applications. TinyLlama demonstrates strong performance on commonsense reasoning tasks, problem-solving tasks, and Chinese understanding tasks. It outperforms existing open-source models in these areas, particularly in tasks involving code and mathematical reasoning. The model is designed to be accessible to researchers and practitioners, offering a lightweight platform for exploring language model innovations. The paper also highlights the effectiveness of multi-stage training and data scheduling, and emphasizes the importance of open-source research in advancing language model development. The model's code and checkpoints are publicly available, promoting transparency and collaboration in the field.TinyLlama is a compact 1.1B parameter language model pretrained on approximately 1 trillion tokens for up to 3 epochs. It is based on the architecture and tokenizer of Llama 2, and leverages advancements from the open-source community, such as FlashAttention and Lit-GPT, to achieve better computational efficiency. Despite its small size, TinyLlama performs well on various downstream tasks and outperforms existing open-source language models of similar size. The model is open-source and available on GitHub. The pre-training data for TinyLlama includes a mix of natural language and code data from SlimPajama and StarCoder. The model uses a decoder-only Transformer architecture with RoPE positional embeddings, pre-norm, RMSNorm, SwiGLU activation, and grouped-query attention. It also incorporates speed optimizations like FSDP and FlashAttention to improve training efficiency. TinyLlama v1.1 was trained with a three-stage pre-training process, including basic pre-training, continual pre-training on specific domains, and a cooldown phase. This approach allows the model to adapt to different tasks and domains. The model variants include TinyLlama v1.1, TinyLlama v1.1 Math&Code, and TinyLlama v1.1 Chinese, each tailored for specific applications. TinyLlama demonstrates strong performance on commonsense reasoning tasks, problem-solving tasks, and Chinese understanding tasks. It outperforms existing open-source models in these areas, particularly in tasks involving code and mathematical reasoning. The model is designed to be accessible to researchers and practitioners, offering a lightweight platform for exploring language model innovations. The paper also highlights the effectiveness of multi-stage training and data scheduling, and emphasizes the importance of open-source research in advancing language model development. The model's code and checkpoints are publicly available, promoting transparency and collaboration in the field.
Reach us at info@study.space
[slides] TinyLlama%3A An Open-Source Small Language Model | StudySpace