Understanding How Much Knowledge Can You Pack into the Parameters of a Language Model%3F

This paper explores the practical utility of pre-trained language models in answering open-domain questions without access to external knowledge. The authors fine-tune large language models, specifically the Text-to-Text Transfer Transformer (T5), to answer questions based on the knowledge stored in their parameters. They find that models with more parameters perform better, with the 11 billion parameter T5 model achieving state-of-the-art results on several datasets. The study also evaluates the impact of salient span masking (SSM) pre-training, which improves performance. The authors conclude that their approach is competitive and suggests a new direction for question answering systems, highlighting the need for more efficient models, interpretability, and reasoning capabilities.This paper explores the practical utility of pre-trained language models in answering open-domain questions without access to external knowledge. The authors fine-tune large language models, specifically the Text-to-Text Transfer Transformer (T5), to answer questions based on the knowledge stored in their parameters. They find that models with more parameters perform better, with the 11 billion parameter T5 model achieving state-of-the-art results on several datasets. The study also evaluates the impact of salient span masking (SSM) pre-training, which improves performance. The authors conclude that their approach is competitive and suggests a new direction for question answering systems, highlighting the need for more efficient models, interpretability, and reasoning capabilities.

How Much Knowledge Can You Pack Into the Parameters of a Language Model?

November 16–20, 2020 | Adam Roberts, Colin Raffel, Noam Shazeer

How Much Knowledge Can You Pack Into the Parameters of a Language Model?

November 16–20, 2020 | Adam Roberts*, Colin Raffel*, Noam Shazeer

November 16–20, 2020 | Adam Roberts, Colin Raffel, Noam Shazeer