3 May 2020 | Zhengbao Jiang1*, Frank F. Xu1*, Jun Araki2, Graham Neubig1
The paper addresses the challenge of accurately estimating the knowledge contained in language models (LMs) by proposing methods to generate more effective prompts. Traditional methods rely on manually created prompts, which may not be optimal and can underestimize the LM's knowledge. The authors propose two automatic methods: mining-based and paraphrasing-based, to generate diverse and high-quality prompts. They also introduce ensemble methods to combine answers from different prompts. Extensive experiments on the LAMA benchmark demonstrate that their methods improve accuracy from 31.1% to 39.6%, providing a tighter lower bound on the LM's knowledge. The paper concludes with insights into how to better query LMs and potential directions for incorporating knowledge into LMs. The code and the LM Prompt and Query Archive (LPAQA) are released to facilitate future research.The paper addresses the challenge of accurately estimating the knowledge contained in language models (LMs) by proposing methods to generate more effective prompts. Traditional methods rely on manually created prompts, which may not be optimal and can underestimize the LM's knowledge. The authors propose two automatic methods: mining-based and paraphrasing-based, to generate diverse and high-quality prompts. They also introduce ensemble methods to combine answers from different prompts. Extensive experiments on the LAMA benchmark demonstrate that their methods improve accuracy from 31.1% to 39.6%, providing a tighter lower bound on the LM's knowledge. The paper concludes with insights into how to better query LMs and potential directions for incorporating knowledge into LMs. The code and the LM Prompt and Query Archive (LPAQA) are released to facilitate future research.