26 Nov 2019 | Yonatan Bisk, Rowan Zellers, Ronan Le Bras, Jianfeng Gao, Yejin Choi
The paper introduces PIQA (Physical Interaction: Question Answering), a benchmark dataset and task designed to evaluate and advance the understanding of physical commonsense in natural language processing (NLP). Physical commonsense refers to the knowledge about how objects can be manipulated and their properties, which is crucial for everyday tasks and problem-solving. While large pretrained models like BERT have made significant progress in abstract domains, they struggle with physical commonsense tasks due to the limited text available on these topics. The PIQA dataset, inspired by how-to instructions from instructables.com, contains over 16,000 training pairs, each involving a physical goal and two possible solutions. Humans perform well on this task (95% accuracy), but large pretrained models like BERT and RoBERTa achieve only around 77%. The paper analyzes the errors made by these models, highlighting their struggles with understanding basic physical properties and relations. The analysis suggests that future research should focus on capturing more detailed and realistic physical knowledge in language models to bridge the gap between symbolic and grounded reasoning.The paper introduces PIQA (Physical Interaction: Question Answering), a benchmark dataset and task designed to evaluate and advance the understanding of physical commonsense in natural language processing (NLP). Physical commonsense refers to the knowledge about how objects can be manipulated and their properties, which is crucial for everyday tasks and problem-solving. While large pretrained models like BERT have made significant progress in abstract domains, they struggle with physical commonsense tasks due to the limited text available on these topics. The PIQA dataset, inspired by how-to instructions from instructables.com, contains over 16,000 training pairs, each involving a physical goal and two possible solutions. Humans perform well on this task (95% accuracy), but large pretrained models like BERT and RoBERTa achieve only around 77%. The paper analyzes the errors made by these models, highlighting their struggles with understanding basic physical properties and relations. The analysis suggests that future research should focus on capturing more detailed and realistic physical knowledge in language models to bridge the gap between symbolic and grounded reasoning.