1995 | Jong Soo Park; Ming-Syan Chen and Philip S. Yu
This paper presents an effective hash-based algorithm, DHP, for mining association rules in large transaction databases. The goal is to efficiently discover large itemsets, which are sets of items that appear frequently in transactions, and then generate association rules based on these itemsets. The key challenge is to reduce the computational cost of generating candidate itemsets, especially in early iterations, where the number of candidate itemsets can be very large.
DHP uses a hash-based approach to generate candidate itemsets, which significantly reduces the number of candidate 2-itemsets compared to previous methods. This reduction allows for earlier trimming of the transaction database, thereby decreasing the computational cost in later iterations. Additionally, DHP employs pruning techniques to further reduce the size of the transaction database by eliminating unnecessary items and transactions.
The algorithm works by first generating a hash table for candidate itemsets, which is used to filter out itemsets that are unlikely to be large. This process is repeated iteratively, with each iteration generating a new set of candidate itemsets based on the previous iteration's results. The algorithm also reduces the size of the transaction database by eliminating transactions that do not contribute to the generation of large itemsets.
Extensive experiments were conducted to evaluate the performance of DHP. The results show that DHP significantly outperforms the Apriori algorithm, especially in later iterations, due to its efficient generation of candidate itemsets and effective pruning of the transaction database. The algorithm is particularly effective in reducing the number of transactions and items in each transaction, leading to faster execution times.
The paper also discusses the impact of various parameters on the performance of DHP, including the size of the hash table and the minimum support threshold. The results indicate that a larger hash table size can lead to a smaller number of candidate itemsets, which is beneficial for performance. However, the trade-off between memory usage and computational efficiency must be considered when choosing the hash table size.
Overall, DHP provides an efficient and effective solution for mining association rules in large transaction databases, with significant improvements in performance compared to traditional algorithms like Apriori. The algorithm's ability to reduce the size of the transaction database and generate fewer candidate itemsets makes it particularly suitable for large-scale data mining tasks.This paper presents an effective hash-based algorithm, DHP, for mining association rules in large transaction databases. The goal is to efficiently discover large itemsets, which are sets of items that appear frequently in transactions, and then generate association rules based on these itemsets. The key challenge is to reduce the computational cost of generating candidate itemsets, especially in early iterations, where the number of candidate itemsets can be very large.
DHP uses a hash-based approach to generate candidate itemsets, which significantly reduces the number of candidate 2-itemsets compared to previous methods. This reduction allows for earlier trimming of the transaction database, thereby decreasing the computational cost in later iterations. Additionally, DHP employs pruning techniques to further reduce the size of the transaction database by eliminating unnecessary items and transactions.
The algorithm works by first generating a hash table for candidate itemsets, which is used to filter out itemsets that are unlikely to be large. This process is repeated iteratively, with each iteration generating a new set of candidate itemsets based on the previous iteration's results. The algorithm also reduces the size of the transaction database by eliminating transactions that do not contribute to the generation of large itemsets.
Extensive experiments were conducted to evaluate the performance of DHP. The results show that DHP significantly outperforms the Apriori algorithm, especially in later iterations, due to its efficient generation of candidate itemsets and effective pruning of the transaction database. The algorithm is particularly effective in reducing the number of transactions and items in each transaction, leading to faster execution times.
The paper also discusses the impact of various parameters on the performance of DHP, including the size of the hash table and the minimum support threshold. The results indicate that a larger hash table size can lead to a smaller number of candidate itemsets, which is beneficial for performance. However, the trade-off between memory usage and computational efficiency must be considered when choosing the hash table size.
Overall, DHP provides an efficient and effective solution for mining association rules in large transaction databases, with significant improvements in performance compared to traditional algorithms like Apriori. The algorithm's ability to reduce the size of the transaction database and generate fewer candidate itemsets makes it particularly suitable for large-scale data mining tasks.