1995 | Jong Soo Park*, Ming-Syan Chen and Philip S. Yu
This paper addresses the challenge of mining association rules from large databases of sales transactions. The process involves identifying large itemsets, which are groups of items that appear in a sufficient number of transactions. The authors propose an efficient hash-based algorithm, DHP (Direct Hashing and Pruning), to generate candidate itemsets. DHP significantly reduces the number of candidate 2-itemsets compared to previous methods, thereby improving overall performance. The algorithm also employs pruning techniques to reduce the transaction database size early in the process, further enhancing efficiency. Extensive simulations demonstrate that DHP outperforms the Apriori algorithm, especially in the initial iterations, by reducing execution times and trimming the database size effectively. The paper concludes with a discussion on the scalability of DHP and its advantages in handling large datasets.This paper addresses the challenge of mining association rules from large databases of sales transactions. The process involves identifying large itemsets, which are groups of items that appear in a sufficient number of transactions. The authors propose an efficient hash-based algorithm, DHP (Direct Hashing and Pruning), to generate candidate itemsets. DHP significantly reduces the number of candidate 2-itemsets compared to previous methods, thereby improving overall performance. The algorithm also employs pruning techniques to reduce the transaction database size early in the process, further enhancing efficiency. Extensive simulations demonstrate that DHP outperforms the Apriori algorithm, especially in the initial iterations, by reducing execution times and trimming the database size effectively. The paper concludes with a discussion on the scalability of DHP and its advantages in handling large datasets.