Vol. 12, No. 3, July 1994 | CHIDANAND APTÉ and FRED DAMERAU SHOLOM M. WEISS
The paper presents extensive experiments using optimized rule-based induction methods on large document collections to discover automatic classification patterns for general document categorization or personalized filtering of free text. The authors compare the performance of machine-generated decision rules with human-engineered rule-based systems, showing that machine-generated rules can achieve comparable performance. They report a significant improvement in performance, from a previously reported 67% recall/precision breakeven point to 80.5% on a key benchmark from the Reuters collection. The study examines various methodological alternatives, including universal versus local dictionaries and binary versus frequency-related features. The results suggest that optimized rule induction is competitive with other machine-learning techniques and comparable to human-engineered systems for document classification. The authors also discuss the potential for combining machine-learning and human-developed systems for document classification and information retrieval services.The paper presents extensive experiments using optimized rule-based induction methods on large document collections to discover automatic classification patterns for general document categorization or personalized filtering of free text. The authors compare the performance of machine-generated decision rules with human-engineered rule-based systems, showing that machine-generated rules can achieve comparable performance. They report a significant improvement in performance, from a previously reported 67% recall/precision breakeven point to 80.5% on a key benchmark from the Reuters collection. The study examines various methodological alternatives, including universal versus local dictionaries and binary versus frequency-related features. The results suggest that optimized rule induction is competitive with other machine-learning techniques and comparable to human-engineered systems for document classification. The authors also discuss the potential for combining machine-learning and human-developed systems for document classification and information retrieval services.