The paper "Using Lexical Chains for Text Summarization" by Regina Barzilay explores a method to generate summaries of original texts without requiring full semantic interpretation. The approach relies on lexical chains, which are derived from a model of topic progression in the text. The authors present an algorithm to compute lexical chains using various knowledge sources, including WordNet, part-of-speech tagging, shallow parsing, and a segmentation algorithm. The summarization process involves three steps: segmenting the text, constructing lexical chains, identifying strong chains, and extracting significant sentences.
The paper discusses the importance of lexical cohesion and coherence in text representation, emphasizing that lexical chains capture the lexical cohesive structure of the text. It introduces an algorithm for constructing lexical chains, which differs from previous methods by incorporating a non-greedy disambiguation heuristic to select appropriate senses for chain members. The algorithm also extends the set of candidate words to include noun compounds and defines text units based on Hearst’s segmentation algorithm.
Empirical results are presented on the identification of strong chains and significant sentences. The authors propose a scoring function for chains based on length and homogeneity, and evaluate three heuristics for extracting significant sentences from the text. The first heuristic selects sentences containing the first appearance of a chain member, the second selects sentences with representative chain members, and the third identifies central text units with high density of chain members.
The paper also discusses limitations and future work, including issues related to sentence granularity, anaphora links, and the lack of control over summary length and detail. Despite these limitations, the method shows promise, with initial evaluations suggesting superior quality compared to commercial summarization systems.The paper "Using Lexical Chains for Text Summarization" by Regina Barzilay explores a method to generate summaries of original texts without requiring full semantic interpretation. The approach relies on lexical chains, which are derived from a model of topic progression in the text. The authors present an algorithm to compute lexical chains using various knowledge sources, including WordNet, part-of-speech tagging, shallow parsing, and a segmentation algorithm. The summarization process involves three steps: segmenting the text, constructing lexical chains, identifying strong chains, and extracting significant sentences.
The paper discusses the importance of lexical cohesion and coherence in text representation, emphasizing that lexical chains capture the lexical cohesive structure of the text. It introduces an algorithm for constructing lexical chains, which differs from previous methods by incorporating a non-greedy disambiguation heuristic to select appropriate senses for chain members. The algorithm also extends the set of candidate words to include noun compounds and defines text units based on Hearst’s segmentation algorithm.
Empirical results are presented on the identification of strong chains and significant sentences. The authors propose a scoring function for chains based on length and homogeneity, and evaluate three heuristics for extracting significant sentences from the text. The first heuristic selects sentences containing the first appearance of a chain member, the second selects sentences with representative chain members, and the third identifies central text units with high density of chain members.
The paper also discusses limitations and future work, including issues related to sentence granularity, anaphora links, and the lack of control over summary length and detail. Despite these limitations, the method shows promise, with initial evaluations suggesting superior quality compared to commercial summarization systems.