[slides and audio] Mining frequent patterns without candidate generation

This paper introduces a novel data structure called the Frequent Pattern Tree (FP-tree) and an efficient mining method called FP-growth for discovering frequent patterns in transaction databases. The FP-tree is an extended prefix-tree structure that stores compressed, crucial information about frequent patterns, significantly reducing the size of the database and avoiding costly candidate set generation. FP-growth employs a pattern fragment growth method, which starts from frequent 1-itemsets and recursively constructs conditional FP-trees to mine longer patterns. The method also uses a partitioning-based, divide-and-conquer approach to decompose the mining task into smaller, more manageable sub-tasks, further reducing the search space. The performance study shows that FP-growth is significantly faster than the Apriori algorithm and other recent methods, especially for long and short frequent patterns. The paper discusses the scalability and potential improvements of the FP-growth method, including handling large databases, materializing FP-trees, and incremental updates. Overall, FP-growth offers a highly efficient and scalable solution for frequent pattern mining.This paper introduces a novel data structure called the Frequent Pattern Tree (FP-tree) and an efficient mining method called FP-growth for discovering frequent patterns in transaction databases. The FP-tree is an extended prefix-tree structure that stores compressed, crucial information about frequent patterns, significantly reducing the size of the database and avoiding costly candidate set generation. FP-growth employs a pattern fragment growth method, which starts from frequent 1-itemsets and recursively constructs conditional FP-trees to mine longer patterns. The method also uses a partitioning-based, divide-and-conquer approach to decompose the mining task into smaller, more manageable sub-tasks, further reducing the search space. The performance study shows that FP-growth is significantly faster than the Apriori algorithm and other recent methods, especially for long and short frequent patterns. The paper discusses the scalability and potential improvements of the FP-growth method, including handling large databases, materializing FP-trees, and incremental updates. Overall, FP-growth offers a highly efficient and scalable solution for frequent pattern mining.

Mining Frequent Patterns without Candidate Generation

2000 | Jiawei Han, Jian Pei, and Yiwen Yin