[slides and audio] Efficiently mining long patterns from databases

The paper introduces Max-Miner, an algorithm designed to efficiently extract maximal frequent itemsets from databases, particularly those with long patterns. Unlike Apriori, which scales exponentially with the length of the longest pattern, Max-Miner scales linearly in the number of maximal patterns, regardless of their length. The algorithm uses a set-enumeration tree search strategy, incorporating pruning techniques based on subset and superset frequencies to reduce the search space. Key features include: 1. **Pruning Techniques**: Max-Miner employs subset and superset frequency pruning to eliminate infrequent candidate groups early in the search. 2. **Heuristic Item Ordering**: Items are ordered based on their frequency to prioritize frequent items, enhancing pruning efficiency. 3. **Support Lower-Bounding**: A technique to compute lower bounds on itemset supports using subset information, further reducing the search space. 4. **Additional Constraints**: The algorithm can incorporate additional constraints, such as finding only the longest frequent itemsets, to improve efficiency. Experiments on various datasets show that Max-Miner outperforms Apriori and Apriori-LB (an enhanced version of Apriori) by several orders of magnitude, especially for datasets with long patterns. The paper also discusses the scalability of Max-Miner, demonstrating that its performance remains relatively constant as the number of maximal frequent itemsets increases. Additionally, the paper compares Max-Miner with Pincer-Search, another algorithm for mining long maximal frequent itemsets, highlighting their complementary nature.The paper introduces Max-Miner, an algorithm designed to efficiently extract maximal frequent itemsets from databases, particularly those with long patterns. Unlike Apriori, which scales exponentially with the length of the longest pattern, Max-Miner scales linearly in the number of maximal patterns, regardless of their length. The algorithm uses a set-enumeration tree search strategy, incorporating pruning techniques based on subset and superset frequencies to reduce the search space. Key features include: 1. **Pruning Techniques**: Max-Miner employs subset and superset frequency pruning to eliminate infrequent candidate groups early in the search. 2. **Heuristic Item Ordering**: Items are ordered based on their frequency to prioritize frequent items, enhancing pruning efficiency. 3. **Support Lower-Bounding**: A technique to compute lower bounds on itemset supports using subset information, further reducing the search space. 4. **Additional Constraints**: The algorithm can incorporate additional constraints, such as finding only the longest frequent itemsets, to improve efficiency. Experiments on various datasets show that Max-Miner outperforms Apriori and Apriori-LB (an enhanced version of Apriori) by several orders of magnitude, especially for datasets with long patterns. The paper also discusses the scalability of Max-Miner, demonstrating that its performance remains relatively constant as the number of maximal frequent itemsets increases. Additionally, the paper compares Max-Miner with Pincer-Search, another algorithm for mining long maximal frequent itemsets, highlighting their complementary nature.

Efficiently Mining Long Patterns from Databases

1998 | Roberto J. Bayardo Jr.