A Trainable Document Summarizer

A Trainable Document Summarizer

1995 | Julian Kupiec, Jan Pedersen and Francine Chen
This paper presents a trainable document summarization program that aims to generate concise and informative summaries from original documents. The program focuses on document extracts, which can be as informative as the full text of a document, suggesting that even shorter extracts may be useful. The authors develop a statistical framework to estimate the probability of a sentence being included in an extract, using a training corpus of document/extract pairs. They evaluate the performance of their method, achieving an average precision of 42%. The evaluation measures include the fraction of manually selected summary sentences that are correctly reproduced by the summarizer and the fraction of matchable sentences correctly identified. The results show that combining location-based heuristics, fixed phrases, and sentence length features yields the best performance. The paper also discusses implementation issues and concludes that the program effectively selects 84% of the sentences chosen by professionals for summaries of 25% of the average test document size.This paper presents a trainable document summarization program that aims to generate concise and informative summaries from original documents. The program focuses on document extracts, which can be as informative as the full text of a document, suggesting that even shorter extracts may be useful. The authors develop a statistical framework to estimate the probability of a sentence being included in an extract, using a training corpus of document/extract pairs. They evaluate the performance of their method, achieving an average precision of 42%. The evaluation measures include the fraction of manually selected summary sentences that are correctly reproduced by the summarizer and the fraction of matchable sentences correctly identified. The results show that combining location-based heuristics, fixed phrases, and sentence length features yields the best performance. The paper also discusses implementation issues and concludes that the program effectively selects 84% of the sentences chosen by professionals for summaries of 25% of the average test document size.
Reach us at info@study.space
[slides] A trainable document summarizer | StudySpace