This paper provides a comprehensive survey and comparison of algorithms for association rule mining. Association rules, which express dependencies between items in transaction data, have gained significant attention since their introduction in 1993. The authors outline the fundamental concepts of association rule mining, including the definition of rules, their confidence, and the challenges of algorithmic complexity and rule selection.
The paper is structured into several sections. The first section introduces the basic principles of association rule mining, detailing the formal problem description, the search space, and methods for determining itemset supports. The second section discusses common algorithms, categorizing them based on their strategies for traversing the search space and determining itemset supports. These algorithms include Apriori, AprioriTID, DIC, Partition, Eclat, and others, each with unique features and optimizations.
The third section compares the performance of these algorithms through runtime experiments on synthetic and real-world datasets. The experiments reveal that while there are fundamental differences in their strategies, the algorithms exhibit similar runtime behavior. The advantages and disadvantages of different approaches, such as counting occurrences versus intersecting tidlists, are balanced out, leading to no single algorithm consistently outperforming the others.
The paper concludes by highlighting the findings and suggesting future research directions, including the development of a hybrid approach that combines counting occurrences and tidlist intersections.This paper provides a comprehensive survey and comparison of algorithms for association rule mining. Association rules, which express dependencies between items in transaction data, have gained significant attention since their introduction in 1993. The authors outline the fundamental concepts of association rule mining, including the definition of rules, their confidence, and the challenges of algorithmic complexity and rule selection.
The paper is structured into several sections. The first section introduces the basic principles of association rule mining, detailing the formal problem description, the search space, and methods for determining itemset supports. The second section discusses common algorithms, categorizing them based on their strategies for traversing the search space and determining itemset supports. These algorithms include Apriori, AprioriTID, DIC, Partition, Eclat, and others, each with unique features and optimizations.
The third section compares the performance of these algorithms through runtime experiments on synthetic and real-world datasets. The experiments reveal that while there are fundamental differences in their strategies, the algorithms exhibit similar runtime behavior. The advantages and disadvantages of different approaches, such as counting occurrences versus intersecting tidlists, are balanced out, leading to no single algorithm consistently outperforming the others.
The paper concludes by highlighting the findings and suggesting future research directions, including the development of a hybrid approach that combines counting occurrences and tidlist intersections.