SpanBERT: Improving Pre-training by Representing and Predicting Spans

SpanBERT: Improving Pre-training by Representing and Predicting Spans

18 Jan 2020 | Mandar Joshi†‡, Danqi Chen†‡§, Yinhan Liu§, Daniel S. Weld†‡, Luke Zettlemoyer†‡§, Omer Levy§
SpanBERT is a pre-training method that improves text span representation and prediction. It extends BERT by masking contiguous random spans instead of individual tokens and training span boundary representations to predict the entire masked span without relying on individual token representations. SpanBERT outperforms BERT and other baselines on span selection tasks like question answering and coreference resolution. With the same training data and model size as BERT_large, SpanBERT achieves 94.6% and 88.7% F1 on SQuAD 1.1 and 2.0, respectively. It also achieves a new state of the art on the OntoNotes coreference resolution task (79.6% F1) and strong performance on TACRED and GLUE. SpanBERT uses a geometric distribution-based masking scheme and a novel span boundary objective (SBO) to predict masked spans using only boundary tokens. It also uses a single-sequence training approach instead of BERT's next sentence prediction objective. SpanBERT outperforms BERT on a wide variety of tasks, particularly span selection tasks. The model is trained on a well-tuned replica of BERT and uses a single-sequence data pipeline. SpanBERT's pre-training process yields models that outperform all BERT baselines on a variety of tasks. The model is evaluated on a comprehensive suite of tasks, including seven question answering tasks, coreference resolution, nine tasks in the GLUE benchmark, and relation extraction. SpanBERT achieves significant improvements on these tasks, particularly on extractive question answering and coreference resolution. The model is implemented in fairseq and uses a batch size of 256 sequences with a maximum of 512 tokens. SpanBERT's pre-training process is compared to three BERT baselines on 17 benchmarks, and it outperforms BERT on almost every task. In 14 tasks, SpanBERT performs better than all baselines. In two tasks (MRPC and QQP), it performs on-par with single-sequence trained BERT but still outperforms other baselines. In one task (SST-2), Google's BERT baseline performs better than SpanBERT by 0.4% accuracy. SpanBERT's improvements are particularly notable in extractive question answering, where it achieves a solid gain of 2.0% F1 even though the baseline is already well above human performance. The model is also evaluated on English benchmarks for question answering, relation extraction, and coreference resolution in addition to GLUE. SpanBERT's performance on these tasks is strong, with significant improvements on MRQA, TACRED, and GLUE. The model's pre-training process is compared to other methods, and it is found to be particularly effective in improving span-based reasoning. The model is also evaluated on a variety of tasks, including sentence-level classification, sentence-pair similarity, and natural language inference. SpanBERT's performance on these tasksSpanBERT is a pre-training method that improves text span representation and prediction. It extends BERT by masking contiguous random spans instead of individual tokens and training span boundary representations to predict the entire masked span without relying on individual token representations. SpanBERT outperforms BERT and other baselines on span selection tasks like question answering and coreference resolution. With the same training data and model size as BERT_large, SpanBERT achieves 94.6% and 88.7% F1 on SQuAD 1.1 and 2.0, respectively. It also achieves a new state of the art on the OntoNotes coreference resolution task (79.6% F1) and strong performance on TACRED and GLUE. SpanBERT uses a geometric distribution-based masking scheme and a novel span boundary objective (SBO) to predict masked spans using only boundary tokens. It also uses a single-sequence training approach instead of BERT's next sentence prediction objective. SpanBERT outperforms BERT on a wide variety of tasks, particularly span selection tasks. The model is trained on a well-tuned replica of BERT and uses a single-sequence data pipeline. SpanBERT's pre-training process yields models that outperform all BERT baselines on a variety of tasks. The model is evaluated on a comprehensive suite of tasks, including seven question answering tasks, coreference resolution, nine tasks in the GLUE benchmark, and relation extraction. SpanBERT achieves significant improvements on these tasks, particularly on extractive question answering and coreference resolution. The model is implemented in fairseq and uses a batch size of 256 sequences with a maximum of 512 tokens. SpanBERT's pre-training process is compared to three BERT baselines on 17 benchmarks, and it outperforms BERT on almost every task. In 14 tasks, SpanBERT performs better than all baselines. In two tasks (MRPC and QQP), it performs on-par with single-sequence trained BERT but still outperforms other baselines. In one task (SST-2), Google's BERT baseline performs better than SpanBERT by 0.4% accuracy. SpanBERT's improvements are particularly notable in extractive question answering, where it achieves a solid gain of 2.0% F1 even though the baseline is already well above human performance. The model is also evaluated on English benchmarks for question answering, relation extraction, and coreference resolution in addition to GLUE. SpanBERT's performance on these tasks is strong, with significant improvements on MRQA, TACRED, and GLUE. The model's pre-training process is compared to other methods, and it is found to be particularly effective in improving span-based reasoning. The model is also evaluated on a variety of tasks, including sentence-level classification, sentence-pair similarity, and natural language inference. SpanBERT's performance on these tasks
Reach us at info@study.space
[slides] SpanBERT%3A Improving Pre-training by Representing and Predicting Spans | StudySpace