BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

24 May 2019 | Jacob Devlin Ming-Wei Chang Kenton Lee Kristina Toutanova
BERT is a pre-trained language representation model that uses deep bidirectional transformers to improve language understanding. Unlike previous models, BERT pre-trains deep bidirectional representations from unlabeled text by conditioning on both left and right context in all layers. This allows BERT to be fine-tuned with just one additional output layer to create state-of-the-art models for tasks like question answering and language inference without significant architecture changes. BERT is conceptually simple and empirically powerful, achieving new state-of-the-art results on eleven natural language processing tasks, including a 7.7% absolute improvement in GLUE score, a 4.6% improvement in MultiNLI accuracy, and improvements in SQuAD v1.1 and v2.0 question answering tasks. The paper introduces BERT, which improves fine-tuning approaches by using a "masked language model" (MLM) pre-training objective, inspired by the Cloze task. The MLM objective enables the representation to fuse left and right context, allowing the pre-training of a deep bidirectional Transformer. In addition to the MLM, the paper also uses a "next sentence prediction" task to jointly pre-train text-pair representations. BERT outperforms many task-specific architectures and achieves state-of-the-art results on a wide range of tasks. The code and pre-trained models are available at https://github.com/google-research/bert. The paper also discusses related work, including unsupervised feature-based approaches, unsupervised fine-tuning approaches, and transfer learning from supervised data. It describes the BERT model architecture, pre-training and fine-tuning procedures, and evaluates BERT on various NLP tasks, including GLUE, SQuAD, and SWAG. The results show that BERT significantly outperforms previous models on these tasks. The paper also includes ablation studies to evaluate the impact of different pre-training tasks and model sizes on performance.BERT is a pre-trained language representation model that uses deep bidirectional transformers to improve language understanding. Unlike previous models, BERT pre-trains deep bidirectional representations from unlabeled text by conditioning on both left and right context in all layers. This allows BERT to be fine-tuned with just one additional output layer to create state-of-the-art models for tasks like question answering and language inference without significant architecture changes. BERT is conceptually simple and empirically powerful, achieving new state-of-the-art results on eleven natural language processing tasks, including a 7.7% absolute improvement in GLUE score, a 4.6% improvement in MultiNLI accuracy, and improvements in SQuAD v1.1 and v2.0 question answering tasks. The paper introduces BERT, which improves fine-tuning approaches by using a "masked language model" (MLM) pre-training objective, inspired by the Cloze task. The MLM objective enables the representation to fuse left and right context, allowing the pre-training of a deep bidirectional Transformer. In addition to the MLM, the paper also uses a "next sentence prediction" task to jointly pre-train text-pair representations. BERT outperforms many task-specific architectures and achieves state-of-the-art results on a wide range of tasks. The code and pre-trained models are available at https://github.com/google-research/bert. The paper also discusses related work, including unsupervised feature-based approaches, unsupervised fine-tuning approaches, and transfer learning from supervised data. It describes the BERT model architecture, pre-training and fine-tuning procedures, and evaluates BERT on various NLP tasks, including GLUE, SQuAD, and SWAG. The results show that BERT significantly outperforms previous models on these tasks. The paper also includes ablation studies to evaluate the impact of different pre-training tasks and model sizes on performance.
Reach us at info@study.space