LLaMA: Open and Efficient Foundation Language Models

LLaMA: Open and Efficient Foundation Language Models

27 Feb 2023 | Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothee Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, Guillaume Lample
LLaMA is a series of open-source foundation language models with parameters ranging from 7B to 65B. These models are trained on trillions of tokens using publicly available data, without relying on proprietary datasets. LLaMA-13B outperforms GPT-3 on most benchmarks, while LLaMA-65B is competitive with top models like Chinchilla-70B and PaLM-540B. The models are released to the research community for further study and development. The training data includes a mix of sources such as CommonCrawl, C4, GitHub, Wikipedia, and books, processed to ensure quality and diversity. The models are based on the transformer architecture with various improvements, including pre-normalization, SwiGLU activation, and rotary embeddings. They are trained using the AdamW optimizer with a cosine learning rate schedule. LLaMA models perform well on various benchmarks, including common sense reasoning, closed-book question answering, reading comprehension, mathematical reasoning, and code generation. LLaMA-65B achieves state-of-the-art results on many tasks, while LLaMA-13B is competitive with larger models. The models also show improvements in instruction fine-tuning, with LLaMA-I outperforming existing instruction-tuned models on MMLU. LLaMA models have been evaluated for biases and toxicity, revealing some biases in the data, particularly in the religion and age categories. The models also show gender bias in certain tasks, such as co-reference resolution. The models are also evaluated for their carbon footprint, showing that training them is energy-intensive, but the models can be run on a single GPU, making them accessible. LLaMA is an open-source project that aims to democratize access to large language models. The models are trained on publicly available data, making them accessible to the research community. The models are also evaluated for their performance on various tasks, showing that they are competitive with other large language models. The models are also evaluated for their carbon footprint, showing that training them is energy-intensive, but the models can be run on a single GPU, making them accessible.LLaMA is a series of open-source foundation language models with parameters ranging from 7B to 65B. These models are trained on trillions of tokens using publicly available data, without relying on proprietary datasets. LLaMA-13B outperforms GPT-3 on most benchmarks, while LLaMA-65B is competitive with top models like Chinchilla-70B and PaLM-540B. The models are released to the research community for further study and development. The training data includes a mix of sources such as CommonCrawl, C4, GitHub, Wikipedia, and books, processed to ensure quality and diversity. The models are based on the transformer architecture with various improvements, including pre-normalization, SwiGLU activation, and rotary embeddings. They are trained using the AdamW optimizer with a cosine learning rate schedule. LLaMA models perform well on various benchmarks, including common sense reasoning, closed-book question answering, reading comprehension, mathematical reasoning, and code generation. LLaMA-65B achieves state-of-the-art results on many tasks, while LLaMA-13B is competitive with larger models. The models also show improvements in instruction fine-tuning, with LLaMA-I outperforming existing instruction-tuned models on MMLU. LLaMA models have been evaluated for biases and toxicity, revealing some biases in the data, particularly in the religion and age categories. The models also show gender bias in certain tasks, such as co-reference resolution. The models are also evaluated for their carbon footprint, showing that training them is energy-intensive, but the models can be run on a single GPU, making them accessible. LLaMA is an open-source project that aims to democratize access to large language models. The models are trained on publicly available data, making them accessible to the research community. The models are also evaluated for their performance on various tasks, showing that they are competitive with other large language models. The models are also evaluated for their carbon footprint, showing that training them is energy-intensive, but the models can be run on a single GPU, making them accessible.
Reach us at info@study.space
[slides] LLaMA%3A Open and Efficient Foundation Language Models | StudySpace