New Methods in Automatic Extracting

New Methods in Automatic Extracting

Vol. 16, No. 2, April 1969 | H. P. EDMUNDSON
This paper introduces new methods for automatically extracting sentences from documents for screening purposes, focusing on selecting sentences that convey the substance of the document effectively. Unlike previous methods that primarily relied on high-frequency content words (key words), the new methods also consider pragmatic words (cue words), title and heading words, and structural indicators (sentence location). The research has led to the development of an operating system and a research methodology, including procedures for compiling dictionaries, setting control parameters, and evaluating the quality of automatic extracts compared to manually produced extracts. The results indicate that the three additional components—pragmatic words, title and heading words, and structural indicators—dominate the frequency component in producing better extracts. The extracting system is flexible and can be parameterized to control and vary the influence of these components. The evaluation of the system using 40 documents showed a mean of 44% coselection and a mean similarity rating of 66% between automatic and target extracts. The paper also discusses the research methodology, the four basic methods (Cue, Key, Title, and Location), and the experimental cycles used to refine the system. The final extracting system is designed to be flexible and efficient, with the ability to handle different document lengths and extract lengths.This paper introduces new methods for automatically extracting sentences from documents for screening purposes, focusing on selecting sentences that convey the substance of the document effectively. Unlike previous methods that primarily relied on high-frequency content words (key words), the new methods also consider pragmatic words (cue words), title and heading words, and structural indicators (sentence location). The research has led to the development of an operating system and a research methodology, including procedures for compiling dictionaries, setting control parameters, and evaluating the quality of automatic extracts compared to manually produced extracts. The results indicate that the three additional components—pragmatic words, title and heading words, and structural indicators—dominate the frequency component in producing better extracts. The extracting system is flexible and can be parameterized to control and vary the influence of these components. The evaluation of the system using 40 documents showed a mean of 44% coselection and a mean similarity rating of 66% between automatic and target extracts. The paper also discusses the research methodology, the four basic methods (Cue, Key, Title, and Location), and the experimental cycles used to refine the system. The final extracting system is designed to be flexible and efficient, with the ability to handle different document lengths and extract lengths.
Reach us at info@study.space