Large Language Models: A Survey

Large Language Models: A Survey

20 Feb 2024 | Shervin Minaee, Tomas Mikolov, Narjes Nikzad, Meysam Chenaghlu, Richard Socher, Xavier Amatriain, Jianfeng Gao
Large Language Models (LLMs) have gained significant attention due to their strong performance on various natural language tasks since the release of ChatGPT in November 2022. LLMs are trained on massive text data using billions of parameters, following scaling laws. This paper reviews prominent LLMs, including GPT, LLaMA, and PaLM, and discusses their characteristics, contributions, and limitations. It also covers techniques for building and augmenting LLMs, popular datasets, evaluation metrics, and benchmark comparisons. The paper highlights open challenges and future research directions. Language modeling has a long history, starting with Shannon's work in the 1950s. Recent advances in transformer-based LLMs, pre-trained on web-scale text, have significantly enhanced language models. LLMs can perform complex tasks, such as multi-step reasoning and instruction following, and can be augmented with external knowledge and tools. They are becoming fundamental components of AI agents and artificial general intelligence (AGI). LLMs are large-scale, pre-trained, statistical language models based on neural networks. Their success is the result of decades of research, divided into four waves: statistical language models, neural language models, pre-trained language models, and LLMs. Statistical language models (SLMs) use n-gram models to estimate word probabilities, while neural language models (NLMs) use embeddings and neural networks to predict the next word. Pre-trained language models (PLMs) are task-agnostic and use pre-training and fine-tuning for various tasks. LLMs are larger, more powerful, and exhibit emergent abilities like in-context learning, instruction following, and multi-step reasoning. The paper reviews three major LLM families: GPT, LLaMA, and PaLM. GPT models, such as GPT-3 and GPT-4, are decoder-only and have shown strong performance on various tasks. LLaMA models, developed by Meta, are open-source and have been widely used. PaLM models, developed by Google, are large-scale and have achieved state-of-the-art results. Other notable LLMs include FLAN, Gopher, T0, ERNIE 3.0, RETRO, GLaM, LaMDA, OPT, Chinchilla, Galactica, CodeGen, AlexaTM, Sparrow, Minerva, MoD, BLOOM, GLM, Pythia, Orca, StarCoder, KOSMOS, and Gemini. These models have pushed the boundaries of LLMs and demonstrated various capabilities. The paper also discusses how LLMs are built, including data preparation, tokenization, pre-training, instruction tuning, and alignment. The dominant LLM architectures are encoder-only, decoder-only, and encoder-decoder. The Transformer architecture is the foundation of many LLMs, with self-attention mechanisms enabling efficient processing of long-term contextual information.Large Language Models (LLMs) have gained significant attention due to their strong performance on various natural language tasks since the release of ChatGPT in November 2022. LLMs are trained on massive text data using billions of parameters, following scaling laws. This paper reviews prominent LLMs, including GPT, LLaMA, and PaLM, and discusses their characteristics, contributions, and limitations. It also covers techniques for building and augmenting LLMs, popular datasets, evaluation metrics, and benchmark comparisons. The paper highlights open challenges and future research directions. Language modeling has a long history, starting with Shannon's work in the 1950s. Recent advances in transformer-based LLMs, pre-trained on web-scale text, have significantly enhanced language models. LLMs can perform complex tasks, such as multi-step reasoning and instruction following, and can be augmented with external knowledge and tools. They are becoming fundamental components of AI agents and artificial general intelligence (AGI). LLMs are large-scale, pre-trained, statistical language models based on neural networks. Their success is the result of decades of research, divided into four waves: statistical language models, neural language models, pre-trained language models, and LLMs. Statistical language models (SLMs) use n-gram models to estimate word probabilities, while neural language models (NLMs) use embeddings and neural networks to predict the next word. Pre-trained language models (PLMs) are task-agnostic and use pre-training and fine-tuning for various tasks. LLMs are larger, more powerful, and exhibit emergent abilities like in-context learning, instruction following, and multi-step reasoning. The paper reviews three major LLM families: GPT, LLaMA, and PaLM. GPT models, such as GPT-3 and GPT-4, are decoder-only and have shown strong performance on various tasks. LLaMA models, developed by Meta, are open-source and have been widely used. PaLM models, developed by Google, are large-scale and have achieved state-of-the-art results. Other notable LLMs include FLAN, Gopher, T0, ERNIE 3.0, RETRO, GLaM, LaMDA, OPT, Chinchilla, Galactica, CodeGen, AlexaTM, Sparrow, Minerva, MoD, BLOOM, GLM, Pythia, Orca, StarCoder, KOSMOS, and Gemini. These models have pushed the boundaries of LLMs and demonstrated various capabilities. The paper also discusses how LLMs are built, including data preparation, tokenization, pre-training, instruction tuning, and alignment. The dominant LLM architectures are encoder-only, decoder-only, and encoder-decoder. The Transformer architecture is the foundation of many LLMs, with self-attention mechanisms enabling efficient processing of long-term contextual information.
Reach us at info@study.space
Understanding Large Language Models%3A A Survey