27 Feb 2024 | Marco Bellagente, Jonathan Tow, Dakota Mahan, Duy Phung, Maksym Zhuravinskyi, Reshinth Adithyan, James Baicoianu, Ben Brooks, Nathan Cooper, Ashish Datta, Meng Lee, Emad Mostaque, Michael Pieler, Nikhil Pinnaparaju, Paulo Rocha, Harry Saini, Hannah Teufel, Niccolo Zanichelli, Carlos Riquelme
The technical report introduces StableLM 2 1.6B, a new generation of language models from Stability AI. The report details the data and training procedures for both the base and instruction-tuned versions of the model, which are available for download via Hugging Face. The models are evaluated on various benchmarks, including zero- and few-shot learning, multilingual tasks, and multi-turn dialogues. At the time of publication, StableLM 2 1.6B was the state-of-the-art open model under 2 billion parameters. The report also provides throughput measurements on edge devices and quantized checkpoints with performance metrics.
The introduction highlights the importance of transparency in AI development, especially for large models, and outlines the training process, which includes pre-training, fine-tuning, and alignment. The pre-training stage focuses on predicting the next token in a sequence using diverse data sources, while the fine-tuning stage enhances conversational skills through supervised fine-tuning, direct preference optimization, and self-knowledge learning.
The experimental results section compares StableLM 2 1.6B with other similarly-sized models, demonstrating its superior performance on various tasks, including multilingual capabilities. The report also provides throughput and power usage data for inference on different devices and discusses future research directions, such as data quality, hallucination mitigation, long context handling, and conditional computation.
The report concludes by emphasizing the model's compact size and efficiency, and it includes acknowledgments and references.The technical report introduces StableLM 2 1.6B, a new generation of language models from Stability AI. The report details the data and training procedures for both the base and instruction-tuned versions of the model, which are available for download via Hugging Face. The models are evaluated on various benchmarks, including zero- and few-shot learning, multilingual tasks, and multi-turn dialogues. At the time of publication, StableLM 2 1.6B was the state-of-the-art open model under 2 billion parameters. The report also provides throughput measurements on edge devices and quantized checkpoints with performance metrics.
The introduction highlights the importance of transparency in AI development, especially for large models, and outlines the training process, which includes pre-training, fine-tuning, and alignment. The pre-training stage focuses on predicting the next token in a sequence using diverse data sources, while the fine-tuning stage enhances conversational skills through supervised fine-tuning, direct preference optimization, and self-knowledge learning.
The experimental results section compares StableLM 2 1.6B with other similarly-sized models, demonstrating its superior performance on various tasks, including multilingual capabilities. The report also provides throughput and power usage data for inference on different devices and discusses future research directions, such as data quality, hallucination mitigation, long context handling, and conditional computation.
The report concludes by emphasizing the model's compact size and efficiency, and it includes acknowledgments and references.