[slides and audio] Text Chunking using Transformation-Based Learning

The paper "Text Chunking using Transformation-Based Learning" by Lance A. Ramshaw and Mitchell P. Marcus explores the application of transformation-based learning to text chunking, a process that divides sentences into non-overlapping segments based on superficial analysis. The authors use Eric Brill's transformation-based learning mechanism, which has been successful in part-of-speech tagging, to identify chunks in tagged text. They encode chunk structure as tags attached to each word, avoiding the complexities of unbalanced bracketing. The Penn Treebank corpus is used to derive training and test sets marked with two types of chunk structures: non-recursive "baseNP" chunks and partitions of sentences into non-overlapping N-type and V-type chunks. The transformation-based learning approach is adapted to this task, with optimizations such as a smaller tag set and a baseline heuristic based on part-of-speech tags. The results show that the method achieves recall and precision rates of about 92% for baseNP chunks and 88% for more complex chunks. The paper also discusses the contributions of lexical rule templates and identifies frequent error classes, suggesting future directions for improving the system's performance.The paper "Text Chunking using Transformation-Based Learning" by Lance A. Ramshaw and Mitchell P. Marcus explores the application of transformation-based learning to text chunking, a process that divides sentences into non-overlapping segments based on superficial analysis. The authors use Eric Brill's transformation-based learning mechanism, which has been successful in part-of-speech tagging, to identify chunks in tagged text. They encode chunk structure as tags attached to each word, avoiding the complexities of unbalanced bracketing. The Penn Treebank corpus is used to derive training and test sets marked with two types of chunk structures: non-recursive "baseNP" chunks and partitions of sentences into non-overlapping N-type and V-type chunks. The transformation-based learning approach is adapted to this task, with optimizations such as a smaller tag set and a baseline heuristic based on part-of-speech tags. The results show that the method achieves recall and precision rates of about 92% for baseNP chunks and 88% for more complex chunks. The paper also discusses the contributions of lexical rule templates and identifies frequent error classes, suggesting future directions for improving the system's performance.

Text Chunking using Transformation-Based Learning

23 May 1995 | Lance A. Ramshaw, Mitchell P. Marcus