9 July 2007 / Revised: 28 September 2007 / Accepted: 8 October 2007 | Xindong Wu · Vipin Kumar · J. Ross Quinlan · Joydeep Ghosh · Qiang Yang · Hiroshi Motoda · Geoffrey J. McLachlan · Angus Ng · Bing Liu · Philip S. Yu · Zhi-Hua Zhou · Michael Steinbach · David J. Hand · Dan Steinberg
This paper presents the top 10 data mining algorithms identified by the IEEE International Conference on Data Mining (ICDM) in December 2006: C4.5, k-Means, SVM, Apriori, EM, PageRank, AdaBoost, kNN, Naive Bayes, and CART. These algorithms are among the most influential in the research community, covering classification, clustering, statistical learning, association analysis, and link mining. The paper provides a description of each algorithm, discusses their impact, and reviews current and further research on them. For example, C4.5 is a decision tree learner that generates comprehensible rulesets, while k-Means is an iterative method for partitioning data into clusters. SVM is a robust and accurate method for classification, and Apriori is a seminal algorithm for finding frequent itemsets in transaction datasets. The paper also discusses the limitations and extensions of these algorithms, such as the need for stable trees in C4.5 and the use of kernel functions in SVM.This paper presents the top 10 data mining algorithms identified by the IEEE International Conference on Data Mining (ICDM) in December 2006: C4.5, k-Means, SVM, Apriori, EM, PageRank, AdaBoost, kNN, Naive Bayes, and CART. These algorithms are among the most influential in the research community, covering classification, clustering, statistical learning, association analysis, and link mining. The paper provides a description of each algorithm, discusses their impact, and reviews current and further research on them. For example, C4.5 is a decision tree learner that generates comprehensible rulesets, while k-Means is an iterative method for partitioning data into clusters. SVM is a robust and accurate method for classification, and Apriori is a seminal algorithm for finding frequent itemsets in transaction datasets. The paper also discusses the limitations and extensions of these algorithms, such as the need for stable trees in C4.5 and the use of kernel functions in SVM.