Mining Sequential Patterns by Pattern-Growth: The PrefixSpan Approach

Mining Sequential Patterns by Pattern-Growth: The PrefixSpan Approach

November 2004 | Jian Pei, Member, IEEE Computer Society, Jiawei Han, Senior Member, IEEE, Behzad Mortazavi-Asl, Jianyong Wang, Helen Pinto, Qiming Chen, Umeshwar Dayal, Member, IEEE Computer Society, and Mei-Chun Hsu
PrefixSpan is a projection-based sequential pattern-growth approach for efficient mining of sequential patterns. It recursively projects a sequence database into smaller projected databases and grows sequential patterns by exploring only locally frequent fragments. Based on FreeSpan, PrefixSpan offers ordered growth and reduced projected databases. A pseudoprojection technique is developed in PrefixSpan to further improve performance. Comprehensive performance studies show that PrefixSpan outperforms GSP, FreeSpan, and SPADE in most cases, with PrefixSpan integrated with pseudoprojection being the fastest. PrefixSpan can also be extended to mining sequential patterns with user-specified constraints. The pattern-growth approach shows promise for efficient mining of other frequent patterns. The paper defines the sequential pattern mining problem and illustrates the GSP algorithm. It then introduces the projection-based sequential pattern growth approach, including FreeSpan and PrefixSpan. Experimental results show that PrefixSpan is more efficient than GSP and FreeSpan, with pseudoprojection further improving performance. The paper also discusses pseudoprojection, which reduces the cost of projection by using pointers and offsets instead of physically copying suffixes. The experimental results and performance analysis demonstrate that PrefixSpan is more efficient than other sequential pattern mining algorithms.PrefixSpan is a projection-based sequential pattern-growth approach for efficient mining of sequential patterns. It recursively projects a sequence database into smaller projected databases and grows sequential patterns by exploring only locally frequent fragments. Based on FreeSpan, PrefixSpan offers ordered growth and reduced projected databases. A pseudoprojection technique is developed in PrefixSpan to further improve performance. Comprehensive performance studies show that PrefixSpan outperforms GSP, FreeSpan, and SPADE in most cases, with PrefixSpan integrated with pseudoprojection being the fastest. PrefixSpan can also be extended to mining sequential patterns with user-specified constraints. The pattern-growth approach shows promise for efficient mining of other frequent patterns. The paper defines the sequential pattern mining problem and illustrates the GSP algorithm. It then introduces the projection-based sequential pattern growth approach, including FreeSpan and PrefixSpan. Experimental results show that PrefixSpan is more efficient than GSP and FreeSpan, with pseudoprojection further improving performance. The paper also discusses pseudoprojection, which reduces the cost of projection by using pointers and offsets instead of physically copying suffixes. The experimental results and performance analysis demonstrate that PrefixSpan is more efficient than other sequential pattern mining algorithms.
Reach us at info@study.space
[slides] Mining sequential patterns by pattern-growth%3A the PrefixSpan approach | StudySpace