JetMoE: Reaching Llama2 Performance with 0.1M Dollars

JetMoE: Reaching Llama2 Performance with 0.1M Dollars

11 Apr 2024 | Yikang Shen, Zhen Guo, Tianle Cai, Zengyi Qin
JetMoE-8B is a new large language model (LLM) trained with a budget of less than $0.1 million, using 1.25 trillion tokens from mixed open-source corpora and 30,000 H100 GPU hours. Despite its low cost, JetMoE-8B demonstrates impressive performance, outperforming the Llama2-7B model and surpassing the Llama2-13B-Chat model. The model is based on an efficient Sparsely-gated Mixture-of-Experts (SMoE) architecture, which reduces inference computation by about 70% compared to Llama2-7B. JetMoE-8B is highly open and academia-friendly, using only public datasets and training code. The detailed training parameters and data mixtures are provided to facilitate future efforts in developing open foundation models. This transparency aims to encourage collaboration and further advancements in the field of accessible and efficient LLMs. The models are publicly available at https://github.com/myshell-ai/JetMoE.JetMoE-8B is a new large language model (LLM) trained with a budget of less than $0.1 million, using 1.25 trillion tokens from mixed open-source corpora and 30,000 H100 GPU hours. Despite its low cost, JetMoE-8B demonstrates impressive performance, outperforming the Llama2-7B model and surpassing the Llama2-13B-Chat model. The model is based on an efficient Sparsely-gated Mixture-of-Experts (SMoE) architecture, which reduces inference computation by about 70% compared to Llama2-7B. JetMoE-8B is highly open and academia-friendly, using only public datasets and training code. The detailed training parameters and data mixtures are provided to facilitate future efforts in developing open foundation models. This transparency aims to encourage collaboration and further advancements in the field of accessible and efficient LLMs. The models are publicly available at https://github.com/myshell-ai/JetMoE.
Reach us at info@study.space
[slides and audio] JetMoE%3A Reaching Llama2 Performance with 0.1M Dollars