New Methods in Automatic Extracting

New Methods in Automatic Extracting

April 1969 | H. P. EDMUNDSON
This paper presents new methods for automatically extracting documents for screening purposes, i.e., selecting sentences that convey the substance of the document. Previous methods focused on high-frequency content words (key words), but the new methods also consider pragmatic words (cue words), title and heading words, and structural indicators (sentence location). The research resulted in an operating system and a research methodology. The extracting system is parameterized to control the influence of these four components. The methodology includes procedures for compiling dictionaries, setting control parameters, and evaluating automatic extracts against manually produced ones. Results show that the three new components dominate in producing better extracts. The research methodology involved studying human abstracting behavior, formulating the abstracting problem, and assigning numerical weights to sentences. The extracting system uses four methods: Cue, Key, Title, and Location. These methods use different clues to select sentences for the extract. The Cue method uses words like "significant" and "impossible" to determine relevance. The Key method uses high-frequency content words. The Title method uses words from the title and headings. The Location method uses sentence position and heading information. The system was tested on 40 documents, showing that 44% of sentences were selected, with a mean similarity rating of 66%. The system was parameterized to allow modification of word lists and weights. The research also evaluated the effectiveness of the system, showing that 84% of selected sentences were extract-worthy. The system was found to be efficient and effective for screening large document collections. The research highlights the importance of considering multiple factors in automatic extracting, including content, structure, and language characteristics. The findings suggest that future research should focus on improving the system and exploring new methods for automatic extracting.This paper presents new methods for automatically extracting documents for screening purposes, i.e., selecting sentences that convey the substance of the document. Previous methods focused on high-frequency content words (key words), but the new methods also consider pragmatic words (cue words), title and heading words, and structural indicators (sentence location). The research resulted in an operating system and a research methodology. The extracting system is parameterized to control the influence of these four components. The methodology includes procedures for compiling dictionaries, setting control parameters, and evaluating automatic extracts against manually produced ones. Results show that the three new components dominate in producing better extracts. The research methodology involved studying human abstracting behavior, formulating the abstracting problem, and assigning numerical weights to sentences. The extracting system uses four methods: Cue, Key, Title, and Location. These methods use different clues to select sentences for the extract. The Cue method uses words like "significant" and "impossible" to determine relevance. The Key method uses high-frequency content words. The Title method uses words from the title and headings. The Location method uses sentence position and heading information. The system was tested on 40 documents, showing that 44% of sentences were selected, with a mean similarity rating of 66%. The system was parameterized to allow modification of word lists and weights. The research also evaluated the effectiveness of the system, showing that 84% of selected sentences were extract-worthy. The system was found to be efficient and effective for screening large document collections. The research highlights the importance of considering multiple factors in automatic extracting, including content, structure, and language characteristics. The findings suggest that future research should focus on improving the system and exploring new methods for automatic extracting.
Reach us at info@study.space