TextTiling: Segmenting Text into Multi-paragraph Subtopic Passages

TextTiling: Segmenting Text into Multi-paragraph Subtopic Passages

1997 | Marti A. Hearst
TextTiling is a technique for segmenting texts into multi-paragraph units, or subtopics, based on patterns of lexical co-occurrence and distribution. The algorithm aims to identify major shifts in discourse by detecting changes in the vocabulary used during subtopic discussions. The method is fully implemented and has been shown to align well with human judgments of subtopic boundaries in 12 texts. Multi-paragraph subtopic segmentation is useful for various text analysis tasks, including information retrieval and summarization. The article discusses the need for multi-paragraph units in hypertext display and information retrieval, and presents the TextTiling algorithm in detail, including its three main components: tokenization, lexical score determination, and boundary identification. The algorithm uses lexical co-occurrence patterns to detect subtopic shifts, and has been evaluated on the basis of its performance in segmenting texts.TextTiling is a technique for segmenting texts into multi-paragraph units, or subtopics, based on patterns of lexical co-occurrence and distribution. The algorithm aims to identify major shifts in discourse by detecting changes in the vocabulary used during subtopic discussions. The method is fully implemented and has been shown to align well with human judgments of subtopic boundaries in 12 texts. Multi-paragraph subtopic segmentation is useful for various text analysis tasks, including information retrieval and summarization. The article discusses the need for multi-paragraph units in hypertext display and information retrieval, and presents the TextTiling algorithm in detail, including its three main components: tokenization, lexical score determination, and boundary identification. The algorithm uses lexical co-occurrence patterns to detect subtopic shifts, and has been evaluated on the basis of its performance in segmenting texts.
Reach us at info@study.space
[slides] Text Tiling%3A Segmenting Text into Multi-paragraph Subtopic Passages | StudySpace