This paper presents a system for automatically labeling semantic roles in sentences. The system uses statistical classifiers trained on hand-annotated data from the FrameNet database, which contains semantic roles defined for various verbs, nouns, and adjectives. The system identifies semantic roles by analyzing syntactic and lexical features derived from parse trees. These features include phrase type, grammatical function, position, voice, and head word. The system combines probabilities from multiple distributions to determine the most likely semantic role for each constituent.
The system was tested on a subset of the FrameNet corpus, achieving an accuracy of 80.4%. This performance was compared to a baseline method that always selected the most probable role, which achieved only 40.9% accuracy. The system's performance was further improved by incorporating lexical clustering, which allowed the system to generalize better when training data was sparse. The system also demonstrated the effectiveness of using head words as indicators of semantic roles.
The paper also addresses the challenge of automatically identifying frame element boundaries in parse trees. Experiments showed that the system could identify frame elements with a high degree of accuracy, even when the boundaries did not match exactly. The system used features such as the path from the target word through the parse tree to the constituent in question, as well as the identity of the target word and the head word of the constituent.
The results indicate that the system is promising for applications in natural language processing, including information extraction, word sense disambiguation, and statistical machine translation. The system's ability to automatically label semantic roles with high accuracy suggests that it could be a valuable tool for a wide range of NLP tasks. The paper concludes that further research is needed to improve the system's performance and to integrate semantic role identification with parsing and other NLP tasks.This paper presents a system for automatically labeling semantic roles in sentences. The system uses statistical classifiers trained on hand-annotated data from the FrameNet database, which contains semantic roles defined for various verbs, nouns, and adjectives. The system identifies semantic roles by analyzing syntactic and lexical features derived from parse trees. These features include phrase type, grammatical function, position, voice, and head word. The system combines probabilities from multiple distributions to determine the most likely semantic role for each constituent.
The system was tested on a subset of the FrameNet corpus, achieving an accuracy of 80.4%. This performance was compared to a baseline method that always selected the most probable role, which achieved only 40.9% accuracy. The system's performance was further improved by incorporating lexical clustering, which allowed the system to generalize better when training data was sparse. The system also demonstrated the effectiveness of using head words as indicators of semantic roles.
The paper also addresses the challenge of automatically identifying frame element boundaries in parse trees. Experiments showed that the system could identify frame elements with a high degree of accuracy, even when the boundaries did not match exactly. The system used features such as the path from the target word through the parse tree to the constituent in question, as well as the identity of the target word and the head word of the constituent.
The results indicate that the system is promising for applications in natural language processing, including information extraction, word sense disambiguation, and statistical machine translation. The system's ability to automatically label semantic roles with high accuracy suggests that it could be a valuable tool for a wide range of NLP tasks. The paper concludes that further research is needed to improve the system's performance and to integrate semantic role identification with parsing and other NLP tasks.