Learning Surface Text Patterns for a Question Answering System

Learning Surface Text Patterns for a Question Answering System

July 2002 | Deepak Ravichandran and Eduard Hovy
This paper explores the use of surface text patterns for open-domain question answering systems. The authors developed a method to automatically learn these patterns from the web. A tagged corpus is built through a bootstrapping process using a few hand-crafted examples of each question type. Patterns are then extracted from the returned documents and standardized. The precision of each pattern and the average precision for each question type are calculated. These patterns are then used to find answers to new questions. The system uses suffix trees to extract substrings of optimal length. It then tests the patterns on new questions from the TREC-10 set and evaluates their results. The system assumes each sentence is a sequence of words and searches for repeated word orderings as evidence for useful answer phrases. The patterns are then used to answer new questions by identifying the question type, extracting the question term, and searching for the presence of each pattern in the text. The system was tested on six question types: BIRTHDATE, LOCATION, INVENTOR, DISCOVERER, DEFINITION, and WHY-FAMOUS. The results showed that the system performs better on web data than on the TREC corpus. The abundance of data on the web allows the system to find answers with high precision. However, the system has limitations, such as not handling long-distance dependencies and not distinguishing between upper and lower case letters. The system also struggles with questions that require multiple words from the question to be in the answer. The authors conclude that the web results outperform the TREC results, suggesting the need to integrate web and TREC data. The system is suitable for multilingual QA due to its simplicity and reliance on surface text patterns rather than complex tools. The system can be adapted to new languages with minimal effort, assuming the web search engine is appropriately switched.This paper explores the use of surface text patterns for open-domain question answering systems. The authors developed a method to automatically learn these patterns from the web. A tagged corpus is built through a bootstrapping process using a few hand-crafted examples of each question type. Patterns are then extracted from the returned documents and standardized. The precision of each pattern and the average precision for each question type are calculated. These patterns are then used to find answers to new questions. The system uses suffix trees to extract substrings of optimal length. It then tests the patterns on new questions from the TREC-10 set and evaluates their results. The system assumes each sentence is a sequence of words and searches for repeated word orderings as evidence for useful answer phrases. The patterns are then used to answer new questions by identifying the question type, extracting the question term, and searching for the presence of each pattern in the text. The system was tested on six question types: BIRTHDATE, LOCATION, INVENTOR, DISCOVERER, DEFINITION, and WHY-FAMOUS. The results showed that the system performs better on web data than on the TREC corpus. The abundance of data on the web allows the system to find answers with high precision. However, the system has limitations, such as not handling long-distance dependencies and not distinguishing between upper and lower case letters. The system also struggles with questions that require multiple words from the question to be in the answer. The authors conclude that the web results outperform the TREC results, suggesting the need to integrate web and TREC data. The system is suitable for multilingual QA due to its simplicity and reliance on surface text patterns rather than complex tools. The system can be adapted to new languages with minimal effort, assuming the web search engine is appropriately switched.
Reach us at info@study.space
[slides] Learning surface text patterns for a Question Answering System | StudySpace