7 Jun 2024 | Dirk Groeneveld, Iz Beltagy, Pete Walsh, Akshita Bhagia, Rodney Kinney, Oyvind Tafjord, Ananya Harsh Jha, Hamish Ivison, Ian Magnusson, Yizhong Wang, Shane Arora, David Atkinson, Russell Author, Khyathi Raghavi Chandu, Arman Co han, Jennifer Dumas, Yanai Elazar, Yuling Gu, Jack Hessel, Tushar Khot, William Merrill, Jacob Morrison, Niklas Muennighoff, Aakanksha Naik, Crystal Nam, Matthew E. Peters, Valentina Pyatkin, Abhilasha Ravichander, Dustin Schwenk, Saurabh Shah, Will Smith, Emma Strubell, Nishant Subramani, Mitchell Wortsman, Pradeep Dasigi, Nathan Lambert, Kyle Richardson, Luke Zettlemoyer, Jesse Dodge, Kyle Lo, Luca Soldaini, Noah A. Smith, Hannaneh Hajishirzi
OLMo is a truly open language model designed to enable scientific study of language models. Unlike previous efforts that only released model weights and inference code, OLMo is accompanied by open training data, training and evaluation code, intermediate checkpoints, and training logs. This release aims to empower the open research community and inspire innovation. OLMo includes four 7B-scale variants and one 1B-scale model, all trained on at least 2T tokens. The model is built using a decoder-only transformer architecture with several improvements over the vanilla transformer, including no biases, non-parametric layer norm, SwiGLU activation function, and rotary positional embeddings. The pretraining data, Dolma, is a diverse, multisource corpus containing trillions of tokens across billions of documents. OLMo is also adapted using instruction fine-tuning and Direct Preference Optimization (DPO) to improve performance and safety. The model is evaluated on a range of downstream tasks and intrinsic language modeling benchmarks. OLMo's framework includes tools for training, adaptation, and evaluation, with all code and weights released under the Apache 2.0 License. The release of OLMo aims to promote open research and scientific progress in understanding and improving language models.OLMo is a truly open language model designed to enable scientific study of language models. Unlike previous efforts that only released model weights and inference code, OLMo is accompanied by open training data, training and evaluation code, intermediate checkpoints, and training logs. This release aims to empower the open research community and inspire innovation. OLMo includes four 7B-scale variants and one 1B-scale model, all trained on at least 2T tokens. The model is built using a decoder-only transformer architecture with several improvements over the vanilla transformer, including no biases, non-parametric layer norm, SwiGLU activation function, and rotary positional embeddings. The pretraining data, Dolma, is a diverse, multisource corpus containing trillions of tokens across billions of documents. OLMo is also adapted using instruction fine-tuning and Direct Preference Optimization (DPO) to improve performance and safety. The model is evaluated on a range of downstream tasks and intrinsic language modeling benchmarks. OLMo's framework includes tools for training, adaptation, and evaluation, with all code and weights released under the Apache 2.0 License. The release of OLMo aims to promote open research and scientific progress in understanding and improving language models.