19 Feb 2018 | Adina Williams, Nikita Nangia, Samuel R. Bowman
This paper introduces the Multi-Genre Natural Language Inference (MultiNLI) corpus, a dataset designed for evaluating machine learning models in sentence understanding. With 433k examples, MultiNLI is one of the largest corpora for natural language inference, offering a wide range of linguistic phenomena and complexity. Unlike the Stanford NLI Corpus (SNLI), which is limited to image captions, MultiNLI includes ten distinct genres of written and spoken English, enhancing its coverage and difficulty. The corpus is constructed to facilitate domain adaptation and cross-domain transfer learning, making it suitable for evaluating models' ability to handle unfamiliar domains. The paper details the data collection methodology, validation process, and baseline models used to assess the corpus's difficulty. The results show that MultiNLI is significantly more challenging than SNLI, with lower baseline model performance and comparable inter-annotator agreement, indicating substantial room for future research. The corpus is freely available and has been used to demonstrate the effectiveness of pre-training and transfer learning in sentence-to-vector models.This paper introduces the Multi-Genre Natural Language Inference (MultiNLI) corpus, a dataset designed for evaluating machine learning models in sentence understanding. With 433k examples, MultiNLI is one of the largest corpora for natural language inference, offering a wide range of linguistic phenomena and complexity. Unlike the Stanford NLI Corpus (SNLI), which is limited to image captions, MultiNLI includes ten distinct genres of written and spoken English, enhancing its coverage and difficulty. The corpus is constructed to facilitate domain adaptation and cross-domain transfer learning, making it suitable for evaluating models' ability to handle unfamiliar domains. The paper details the data collection methodology, validation process, and baseline models used to assess the corpus's difficulty. The results show that MultiNLI is significantly more challenging than SNLI, with lower baseline model performance and comparable inter-annotator agreement, indicating substantial room for future research. The corpus is freely available and has been used to demonstrate the effectiveness of pre-training and transfer learning in sentence-to-vector models.