27 Feb 2024 | Marco Bellagenté, Jonathan Tow, Dakota Mahan, Duy Phung, Maksym Zhuravinskyy, Reshinh Adithyan, James Baicoianu, Ben Brooks, Nathan Cooper, Ashish Datta, Meng Lee, Emad Mostaque, Michael Pieler, Nikhil Pinnaparju, Paulo Rocha, Harry Saini, Hannah Teufel, Niccolo Zanichelli, Carlos Riquelme
StableLM 2 1.6B is a new generation of language models introduced by Stability AI. This technical report details the data and training process for the base and instruction-tuned versions of StableLM 2 1.6B, which are available on Hugging Face. The model is evaluated on various benchmarks, including zero- and few-shot tasks, multilingual benchmarks, and the MT benchmark for multi-turn dialogues. At the time of publication, StableLM 2 1.6B was the state-of-the-art open model under 2B parameters. The model is also evaluated for throughput on edge devices and several quantized checkpoints are provided for performance comparisons.
The model was pre-trained on a diverse set of data sources, including RefinedWeb, Pile, RedPajama, Stack, OpenWebText, OpenWebMath, and parts of CulturaX. The training data was carefully selected and mixed to ensure a balanced representation of different languages and domains. The model uses the Arcade100k tokenizer, which includes special tokens for code and digit-split handling. The model architecture is similar to the LLaMA architecture, with modifications such as rotary position embeddings and layer normalization.
The model was fine-tuned using three main steps: supervised fine-tuning (SFT), direct preference optimization (DPO), and self-knowledge learning. The SFT process involved training on instruction datasets, while DPO was used to align the model with human preferences. Self-knowledge learning was used to improve the model's understanding of its own capabilities and limitations.
The model was evaluated on various benchmarks, including the Open LLM Leaderboard, multilingual evaluations, and the MT-Bench. The results showed that StableLM 2 1.6B outperformed other models in several tasks, including multilingual capabilities and conversational skills. The model is also available in quantized versions for efficient deployment on edge devices.
The report also discusses the environmental and societal impact of training StableLM 2 1.6B, including the carbon footprint and potential societal implications. The model is released under an open non-commercial license to promote transparency and accessibility in AI research. The report concludes by highlighting the model's performance and potential for future research and development.StableLM 2 1.6B is a new generation of language models introduced by Stability AI. This technical report details the data and training process for the base and instruction-tuned versions of StableLM 2 1.6B, which are available on Hugging Face. The model is evaluated on various benchmarks, including zero- and few-shot tasks, multilingual benchmarks, and the MT benchmark for multi-turn dialogues. At the time of publication, StableLM 2 1.6B was the state-of-the-art open model under 2B parameters. The model is also evaluated for throughput on edge devices and several quantized checkpoints are provided for performance comparisons.
The model was pre-trained on a diverse set of data sources, including RefinedWeb, Pile, RedPajama, Stack, OpenWebText, OpenWebMath, and parts of CulturaX. The training data was carefully selected and mixed to ensure a balanced representation of different languages and domains. The model uses the Arcade100k tokenizer, which includes special tokens for code and digit-split handling. The model architecture is similar to the LLaMA architecture, with modifications such as rotary position embeddings and layer normalization.
The model was fine-tuned using three main steps: supervised fine-tuning (SFT), direct preference optimization (DPO), and self-knowledge learning. The SFT process involved training on instruction datasets, while DPO was used to align the model with human preferences. Self-knowledge learning was used to improve the model's understanding of its own capabilities and limitations.
The model was evaluated on various benchmarks, including the Open LLM Leaderboard, multilingual evaluations, and the MT-Bench. The results showed that StableLM 2 1.6B outperformed other models in several tasks, including multilingual capabilities and conversational skills. The model is also available in quantized versions for efficient deployment on edge devices.
The report also discusses the environmental and societal impact of training StableLM 2 1.6B, including the carbon footprint and potential societal implications. The model is released under an open non-commercial license to promote transparency and accessibility in AI research. The report concludes by highlighting the model's performance and potential for future research and development.