23 May 1995 | Lance A. Ramshaw, Mitchell P. Marcus
This paper introduces a transformation-based learning approach for text chunking, building on Eric Brill's earlier work on part-of-speech tagging. The method involves encoding chunk structure in new tags attached to each word, allowing the system to learn transformational rules that iteratively improve predictions. The approach is tested on data derived from the Penn Treebank corpus, achieving high recall and precision rates for both baseNP chunks and more complex sentence-partitioning chunks.
The paper discusses the application of transformation-based learning to text chunking, highlighting differences from part-of-speech tagging, such as the smaller tagset and the fixed nature of part-of-speech assignments. It describes the encoding of chunk structure using additional tags, and the use of rule templates that combine word and part-of-speech information to improve chunking performance.
The paper also presents results from experiments on baseNP and partitioning chunks, showing that the system achieves high accuracy, with baseNP chunks achieving 92% recall and precision, and partitioning chunks achieving 88%. The results suggest that lexical rule templates contribute significantly to performance, particularly for partitioning chunks.
The paper concludes that transformation-based learning provides a useful and feasible method for text chunking, offering a foundation for further linguistic analysis and serving as a basis for larger-scale grouping and direct extraction of subunits like index terms. The approach is also shown to be adaptable to other tasks, such as prepositional phrase attachment disambiguation and noun phrase parsing.This paper introduces a transformation-based learning approach for text chunking, building on Eric Brill's earlier work on part-of-speech tagging. The method involves encoding chunk structure in new tags attached to each word, allowing the system to learn transformational rules that iteratively improve predictions. The approach is tested on data derived from the Penn Treebank corpus, achieving high recall and precision rates for both baseNP chunks and more complex sentence-partitioning chunks.
The paper discusses the application of transformation-based learning to text chunking, highlighting differences from part-of-speech tagging, such as the smaller tagset and the fixed nature of part-of-speech assignments. It describes the encoding of chunk structure using additional tags, and the use of rule templates that combine word and part-of-speech information to improve chunking performance.
The paper also presents results from experiments on baseNP and partitioning chunks, showing that the system achieves high accuracy, with baseNP chunks achieving 92% recall and precision, and partitioning chunks achieving 88%. The results suggest that lexical rule templates contribute significantly to performance, particularly for partitioning chunks.
The paper concludes that transformation-based learning provides a useful and feasible method for text chunking, offering a foundation for further linguistic analysis and serving as a basis for larger-scale grouping and direct extraction of subunits like index terms. The approach is also shown to be adaptable to other tasks, such as prepositional phrase attachment disambiguation and noun phrase parsing.