This paper presents a simple re-implementation of BERT for query-based passage re-ranking, achieving state-of-the-art results on the TREC-CAR dataset and the top entry in the leaderboard of the MS MARCO passage retrieval task. The system outperforms the previous state of the art by 27% (relative) in MRR@10. The code to reproduce the results is available at https://github.com/nyu-dl/dl4marco-bert.
The paper discusses the progress in machine reading comprehension, with the introduction of large-scale datasets and the adoption of neural models. The information retrieval community has also seen the development of neural ranking models. However, until recently, there were only a few large datasets for passage ranking, with the notable exception of TREC-CAR.
The authors argue that the same two ingredients that made progress on reading comprehension tasks available for passage ranking tasks: the MS MARCO passage ranking dataset and BERT, a powerful general-purpose natural language processing model.
The paper describes how BERT is re-purposed as a passage re-ranker and achieves state-of-the-art results on the MS MARCO passage re-ranking task. The re-ranker estimates the relevance of a candidate passage to a query using BERT. The query is treated as sentence A and the passage text as sentence B. The query is truncated to have at most 64 tokens, and the passage text is truncated such that the concatenation of query, passage, and separator tokens has a maximum length of 512 tokens. A BERT_LARGE model is used as a binary classification model, with the [CLS] vector as input to a single-layer neural network to obtain the probability of the passage being relevant.
The authors train and evaluate their models on two passage-ranking datasets, MS MARCO and TREC-CAR. They use a TPU v3-8 with a batch size of 128 for 100k iterations, which takes approximately 30 hours. They use ADAM with a learning rate of 3e-6, and a dropout probability of 0.1 on all layers.
For TREC-CAR, the authors follow the same procedure as for MS MARCO, but pre-trained the BERT re-ranker only on the half of Wikipedia used by TREC-CAR's training set. They generate query-passage pairs by retrieving the top ten passages from the entire TREC-CAR corpus using BM25. They train the model for 400k iterations, or 12.8M examples, which corresponds to only 40% of the training set.
The results show that the proposed BERT-based models surpass the previous state-of-the-art models by a large margin on both tasks. The pretrained models used in this work require few training examples from the end task to achieve good performance.This paper presents a simple re-implementation of BERT for query-based passage re-ranking, achieving state-of-the-art results on the TREC-CAR dataset and the top entry in the leaderboard of the MS MARCO passage retrieval task. The system outperforms the previous state of the art by 27% (relative) in MRR@10. The code to reproduce the results is available at https://github.com/nyu-dl/dl4marco-bert.
The paper discusses the progress in machine reading comprehension, with the introduction of large-scale datasets and the adoption of neural models. The information retrieval community has also seen the development of neural ranking models. However, until recently, there were only a few large datasets for passage ranking, with the notable exception of TREC-CAR.
The authors argue that the same two ingredients that made progress on reading comprehension tasks available for passage ranking tasks: the MS MARCO passage ranking dataset and BERT, a powerful general-purpose natural language processing model.
The paper describes how BERT is re-purposed as a passage re-ranker and achieves state-of-the-art results on the MS MARCO passage re-ranking task. The re-ranker estimates the relevance of a candidate passage to a query using BERT. The query is treated as sentence A and the passage text as sentence B. The query is truncated to have at most 64 tokens, and the passage text is truncated such that the concatenation of query, passage, and separator tokens has a maximum length of 512 tokens. A BERT_LARGE model is used as a binary classification model, with the [CLS] vector as input to a single-layer neural network to obtain the probability of the passage being relevant.
The authors train and evaluate their models on two passage-ranking datasets, MS MARCO and TREC-CAR. They use a TPU v3-8 with a batch size of 128 for 100k iterations, which takes approximately 30 hours. They use ADAM with a learning rate of 3e-6, and a dropout probability of 0.1 on all layers.
For TREC-CAR, the authors follow the same procedure as for MS MARCO, but pre-trained the BERT re-ranker only on the half of Wikipedia used by TREC-CAR's training set. They generate query-passage pairs by retrieving the top ten passages from the entire TREC-CAR corpus using BM25. They train the model for 400k iterations, or 12.8M examples, which corresponds to only 40% of the training set.
The results show that the proposed BERT-based models surpass the previous state-of-the-art models by a large margin on both tasks. The pretrained models used in this work require few training examples from the end task to achieve good performance.