1993 | Rakesh Agrawal, Tomasz Imieliński, Arun Swami
This paper introduces an efficient algorithm for mining association rules between sets of items in large databases. Association rules describe relationships between items in transactional data, such as "90% of transactions that purchase bread and butter also purchase milk." The algorithm incorporates buffer management, estimation, and pruning techniques to efficiently generate all significant association rules that meet specified confidence and support thresholds. The algorithm is tested on sales data from a large retailing company, demonstrating its effectiveness.
The problem of association rule mining is decomposed into two subproblems: (1) identifying large itemsets (item combinations that appear in a minimum percentage of transactions), and (2) generating association rules from these large itemsets. The first subproblem is addressed by iteratively measuring itemsets, starting with the empty set and extending them based on database tuples. The second subproblem involves generating rules from large itemsets by considering subsets of the itemset as antecedents and the remaining items as consequents.
To optimize performance, the algorithm uses estimation techniques to predict which itemsets are likely to be large, reducing the number of itemsets that need to be measured. Pruning techniques are also employed to eliminate itemsets that are unlikely to be large, based on remaining tuples in the current pass or synthesized pruning functions. These techniques help reduce memory usage and computational effort.
The algorithm is tested on a dataset of 46,873 customer transactions from a large retailing company. It successfully identifies association rules between departments, such as "Tires ⇒ Automotive Services" with high confidence and support. The algorithm's performance is evaluated using both estimation and pruning techniques, showing high accuracy and efficiency.
The work is part of the Quest project at IBM Almaden Research Center, which explores various aspects of database mining. The paper also discusses related work in AI and database research, highlighting the importance of association rule mining as a new application area for databases.This paper introduces an efficient algorithm for mining association rules between sets of items in large databases. Association rules describe relationships between items in transactional data, such as "90% of transactions that purchase bread and butter also purchase milk." The algorithm incorporates buffer management, estimation, and pruning techniques to efficiently generate all significant association rules that meet specified confidence and support thresholds. The algorithm is tested on sales data from a large retailing company, demonstrating its effectiveness.
The problem of association rule mining is decomposed into two subproblems: (1) identifying large itemsets (item combinations that appear in a minimum percentage of transactions), and (2) generating association rules from these large itemsets. The first subproblem is addressed by iteratively measuring itemsets, starting with the empty set and extending them based on database tuples. The second subproblem involves generating rules from large itemsets by considering subsets of the itemset as antecedents and the remaining items as consequents.
To optimize performance, the algorithm uses estimation techniques to predict which itemsets are likely to be large, reducing the number of itemsets that need to be measured. Pruning techniques are also employed to eliminate itemsets that are unlikely to be large, based on remaining tuples in the current pass or synthesized pruning functions. These techniques help reduce memory usage and computational effort.
The algorithm is tested on a dataset of 46,873 customer transactions from a large retailing company. It successfully identifies association rules between departments, such as "Tires ⇒ Automotive Services" with high confidence and support. The algorithm's performance is evaluated using both estimation and pruning techniques, showing high accuracy and efficiency.
The work is part of the Quest project at IBM Almaden Research Center, which explores various aspects of database mining. The paper also discusses related work in AI and database research, highlighting the importance of association rule mining as a new application area for databases.