Mining Sequential Patterns by Pattern-Growth: The PrefixSpan Approach

Mining Sequential Patterns by Pattern-Growth: The PrefixSpan Approach

NOVEMBER 2004 | Jian Pei, Member, IEEE Computer Society, Jiawei Han, Senior Member, IEEE, Behzad Mortazavi-Asl, Jianyong Wang, Helen Pinto, Qiming Chen, Umeshwar Dayal, Member, IEEE Computer Society, and Mei-Chun Hsu
The paper "Mining Sequential Patterns by Pattern-Growth: The PrefixSpan Approach" by Jian Pei, Jiawei Han, Behzad Mortazavi-Asl, Jianyong Wang, Helen Pinto, Qiming Chen, Umeshwar Dayal, and Mei-Chun Hsu introduces a novel approach to sequential pattern mining, which is an important data mining problem with broad applications. The authors propose a projection-based, sequential pattern-growth approach to efficiently mine sequential patterns in large sequence databases. This approach recursively projects the sequence database into smaller projected databases and grows sequential patterns in each projected database by exploring only locally frequent fragments. The paper presents FreeSpan, an initial pattern growth method, and PrefixSpan, which offers ordered growth and reduced projected databases. To further improve performance, a pseudoprojection technique is developed in PrefixSpan. Comprehensive performance studies show that PrefixSpan outperforms a priori-based algorithms like GSP, FreeSpan, and SPADE, and integrated with pseudoprojection, it is the fastest among all tested algorithms. The method can also be extended to mining sequential patterns with user-specified constraints and can be applied to other types of frequent patterns, such as frequent substructures. The paper includes a detailed problem definition, algorithm descriptions, and experimental results to support the effectiveness and efficiency of the proposed approach.The paper "Mining Sequential Patterns by Pattern-Growth: The PrefixSpan Approach" by Jian Pei, Jiawei Han, Behzad Mortazavi-Asl, Jianyong Wang, Helen Pinto, Qiming Chen, Umeshwar Dayal, and Mei-Chun Hsu introduces a novel approach to sequential pattern mining, which is an important data mining problem with broad applications. The authors propose a projection-based, sequential pattern-growth approach to efficiently mine sequential patterns in large sequence databases. This approach recursively projects the sequence database into smaller projected databases and grows sequential patterns in each projected database by exploring only locally frequent fragments. The paper presents FreeSpan, an initial pattern growth method, and PrefixSpan, which offers ordered growth and reduced projected databases. To further improve performance, a pseudoprojection technique is developed in PrefixSpan. Comprehensive performance studies show that PrefixSpan outperforms a priori-based algorithms like GSP, FreeSpan, and SPADE, and integrated with pseudoprojection, it is the fastest among all tested algorithms. The method can also be extended to mining sequential patterns with user-specified constraints and can be applied to other types of frequent patterns, such as frequent substructures. The paper includes a detailed problem definition, algorithm descriptions, and experimental results to support the effectiveness and efficiency of the proposed approach.
Reach us at info@study.space
[slides and audio] Mining sequential patterns by pattern-growth%3A the PrefixSpan approach