This paper explores the improvement of automatic keyword extraction from abstracts using a supervised machine learning algorithm. The author, Anette Hulth, from Stockholm University, argues that incorporating linguistic knowledge, such as syntactic features (e.g., noun phrase chunks and part-of-speech tags), into the representation of data can enhance the accuracy of keyword extraction compared to relying solely on statistical measures like term frequency and n-grams. The study compares three term selection approaches—n-grams, noun phrase chunks, and terms matching specific POS tag sequences—and four features—term frequency, collection frequency, relative position, and POS tags. The results show that extracting noun phrase chunks provides better precision, while using POS tag patterns increases recall. The highest F-score is achieved by the n-gram approach with the addition of POS tags. The paper also discusses the limitations of the current approach, such as the lack of semantic relationships between POS tag features, and suggests future directions for improving keyword extraction, including more sophisticated evaluation methods and the generation of keywords rather than just extraction.This paper explores the improvement of automatic keyword extraction from abstracts using a supervised machine learning algorithm. The author, Anette Hulth, from Stockholm University, argues that incorporating linguistic knowledge, such as syntactic features (e.g., noun phrase chunks and part-of-speech tags), into the representation of data can enhance the accuracy of keyword extraction compared to relying solely on statistical measures like term frequency and n-grams. The study compares three term selection approaches—n-grams, noun phrase chunks, and terms matching specific POS tag sequences—and four features—term frequency, collection frequency, relative position, and POS tags. The results show that extracting noun phrase chunks provides better precision, while using POS tag patterns increases recall. The highest F-score is achieved by the n-gram approach with the addition of POS tags. The paper also discusses the limitations of the current approach, such as the lack of semantic relationships between POS tag features, and suggests future directions for improving keyword extraction, including more sophisticated evaluation methods and the generation of keywords rather than just extraction.