OpenELM: An Efficient Language Model Family with Open Training and Inference Framework

OpenELM: An Efficient Language Model Family with Open Training and Inference Framework

2 May 2024 | Sachin Mehta, Mohammad Hossein Sekhavat, Qingqing Cao, Maxwell Horton, Yanzi Jin, Chenfan Sun, Iman Mirzadeh, Mahyar Najibi, Peter Zatloukal, Mohammad Rastegari
OpenELM is an open-source language model family that outperforms existing open LLMs, such as OLMo, by 2.36% while requiring 2× fewer pre-training tokens. It uses a layer-wise scaling strategy to efficiently allocate parameters within the transformer model, leading to improved accuracy. OpenELM is trained on publicly available datasets, including RefinedWeb, deduplicated PILE, a subset of RedPajama, and a subset of Dolma v1.6, totaling approximately 1.8 trillion tokens. The model is designed with a decoder-only transformer architecture, incorporating techniques such as pre-normalization, rotary positional embeddings, and SwiGLU feed-forward networks. OpenELM is evaluated on various tasks, including standard zero-shot tasks, OpenLLM leaderboard tasks, and LLM360 leaderboard tasks, demonstrating superior performance across multiple metrics. The model is also fine-tuned using instruction tuning and parameter-efficient fine-tuning methods, achieving improved accuracy. OpenELM is available for use on Apple devices through the MLX library. The model's performance is benchmarked on different hardware, showing that while it has higher accuracy, it is slower than OLMo. The release of OpenELM aims to empower the open research community by providing access to a state-of-the-art language model with open training and inference frameworks. The model's source code and pre-trained weights are available on GitHub and HuggingFace.OpenELM is an open-source language model family that outperforms existing open LLMs, such as OLMo, by 2.36% while requiring 2× fewer pre-training tokens. It uses a layer-wise scaling strategy to efficiently allocate parameters within the transformer model, leading to improved accuracy. OpenELM is trained on publicly available datasets, including RefinedWeb, deduplicated PILE, a subset of RedPajama, and a subset of Dolma v1.6, totaling approximately 1.8 trillion tokens. The model is designed with a decoder-only transformer architecture, incorporating techniques such as pre-normalization, rotary positional embeddings, and SwiGLU feed-forward networks. OpenELM is evaluated on various tasks, including standard zero-shot tasks, OpenLLM leaderboard tasks, and LLM360 leaderboard tasks, demonstrating superior performance across multiple metrics. The model is also fine-tuned using instruction tuning and parameter-efficient fine-tuning methods, achieving improved accuracy. OpenELM is available for use on Apple devices through the MLX library. The model's performance is benchmarked on different hardware, showing that while it has higher accuracy, it is slower than OLMo. The release of OpenELM aims to empower the open research community by providing access to a state-of-the-art language model with open training and inference frameworks. The model's source code and pre-trained weights are available on GitHub and HuggingFace.
Reach us at info@study.space
[slides] OpenELM%3A An Efficient Language Model Family with Open Training and Inference Framework | StudySpace