Mining Frequent Patterns without Candidate Generation

Mining Frequent Patterns without Candidate Generation

2000 | Jiawei Han, Jian Pei, and Yiwen Yin
This paper introduces a novel approach for mining frequent patterns in transaction databases without the need for candidate generation. The authors propose the frequent pattern tree (FP-tree), an extended prefix-tree structure that stores compressed, crucial information about frequent patterns. They develop an efficient FP-tree-based mining method called FP-growth, which uses pattern fragment growth to mine the complete set of frequent patterns. The FP-growth method achieves efficiency through three key techniques: (1) compressing large databases into a highly condensed data structure, avoiding costly database scans; (2) using a pattern fragment growth method to avoid generating a large number of candidate sets; and (3) using a partitioning-based, divide-and-conquer method to decompose the mining task into smaller tasks, reducing the search space. The authors' performance study shows that FP-growth is significantly faster than the Apriori algorithm and other recent frequent pattern mining methods, especially for long and short frequent patterns. The FP-tree structure is compact and efficient, allowing for the storage of frequent pattern information in a way that reduces the need for expensive candidate generation and testing. The paper also discusses the construction of FP-trees, the completeness and compactness of FP-trees, and the efficiency of the FP-growth algorithm. The authors conclude that FP-growth is a highly efficient method for mining frequent patterns in large databases and has been implemented in the DBMiner system for industrial applications.This paper introduces a novel approach for mining frequent patterns in transaction databases without the need for candidate generation. The authors propose the frequent pattern tree (FP-tree), an extended prefix-tree structure that stores compressed, crucial information about frequent patterns. They develop an efficient FP-tree-based mining method called FP-growth, which uses pattern fragment growth to mine the complete set of frequent patterns. The FP-growth method achieves efficiency through three key techniques: (1) compressing large databases into a highly condensed data structure, avoiding costly database scans; (2) using a pattern fragment growth method to avoid generating a large number of candidate sets; and (3) using a partitioning-based, divide-and-conquer method to decompose the mining task into smaller tasks, reducing the search space. The authors' performance study shows that FP-growth is significantly faster than the Apriori algorithm and other recent frequent pattern mining methods, especially for long and short frequent patterns. The FP-tree structure is compact and efficient, allowing for the storage of frequent pattern information in a way that reduces the need for expensive candidate generation and testing. The paper also discusses the construction of FP-trees, the completeness and compactness of FP-trees, and the efficiency of the FP-growth algorithm. The authors conclude that FP-growth is a highly efficient method for mining frequent patterns in large databases and has been implemented in the DBMiner system for industrial applications.
Reach us at info@study.space