KEA: Practical Automatic Keyphrase Extraction

KEA: Practical Automatic Keyphrase Extraction

1999 | Ian H. Witten, Gordon W. Paynter, Eibe Frank, Carl Gutwin and Craig G. Nevill-Manning
The KEA (Keyphrases Extraction Algorithm) is an algorithm designed to automatically extract keyphrases from text documents. Keyphrases are valuable metadata that summarize and characterize documents, making them useful in various applications such as information retrieval, search indexes, browsing, and document clustering. The KEA algorithm consists of two stages: training and extraction. During training, the algorithm creates a model using known keyphrases from a set of training documents. In the extraction stage, it identifies candidate keyphrases from new documents, calculates feature values, and uses a machine-learning model to predict which candidates are keyphrases. The features used include TF×IDF (term frequency times inverse document frequency) and the first occurrence of a phrase. The evaluation of KEA using documents from the New Zealand Digital Library showed that it can match between one and two of the five keyphrases chosen by the author on average, with good performance considering the large number of candidate phrases. KEA is effective with a small training set and works best on full text rather than just titles or abstracts. The authors plan to further evaluate KEA using human expert judges and compare it with other document summarization methods.The KEA (Keyphrases Extraction Algorithm) is an algorithm designed to automatically extract keyphrases from text documents. Keyphrases are valuable metadata that summarize and characterize documents, making them useful in various applications such as information retrieval, search indexes, browsing, and document clustering. The KEA algorithm consists of two stages: training and extraction. During training, the algorithm creates a model using known keyphrases from a set of training documents. In the extraction stage, it identifies candidate keyphrases from new documents, calculates feature values, and uses a machine-learning model to predict which candidates are keyphrases. The features used include TF×IDF (term frequency times inverse document frequency) and the first occurrence of a phrase. The evaluation of KEA using documents from the New Zealand Digital Library showed that it can match between one and two of the five keyphrases chosen by the author on average, with good performance considering the large number of candidate phrases. KEA is effective with a small training set and works best on full text rather than just titles or abstracts. The authors plan to further evaluate KEA using human expert judges and compare it with other document summarization methods.
Reach us at info@study.space