Sequential PAtern Mining using A Bitmap Representation

Sequential PAtern Mining using A Bitmap Representation

2002 | Jay Ayres, Johannes Gehrke, Tomi Yiu, and Jason Flannick
The paper introduces a new algorithm called SPAM (Sequential PAttern Mining) for efficiently mining sequential patterns in large transaction databases. SPAM is designed to handle very long sequential patterns and employs a depth-first search strategy combined with effective pruning mechanisms. The algorithm uses a vertical bitmap representation of the database to enable efficient support counting. Key features of SPAM include: 1. **Depth-First Search Strategy**: SPAM traverses the sequence tree in a depth-first manner, testing the support of each sequence-extended and itemset-extended child. 2. **Pruning Techniques**: The algorithm uses Apriori-based pruning to minimize the search space, including S-step and I-step pruning. 3. **Vertical Bitmap Representation**: Efficient counting is achieved by partitioning the bitmap such that all transactions of a sequence appear together, allowing for quick support counting. 4. **Online Outputting**: SPAM incrementally outputs new frequent itemsets in an online fashion. Experimental results show that SPAM outperforms previous algorithms like SPADE and PrefixSpan by up to an order of magnitude, especially on large datasets. The performance improvement is attributed to the efficient counting process and the effective pruning techniques. However, SPAM is less space-efficient compared to SPADE due to its bitmap representation. The paper also discusses the trade-offs between space and time efficiency and suggests potential bitmap compression methods to address this issue.The paper introduces a new algorithm called SPAM (Sequential PAttern Mining) for efficiently mining sequential patterns in large transaction databases. SPAM is designed to handle very long sequential patterns and employs a depth-first search strategy combined with effective pruning mechanisms. The algorithm uses a vertical bitmap representation of the database to enable efficient support counting. Key features of SPAM include: 1. **Depth-First Search Strategy**: SPAM traverses the sequence tree in a depth-first manner, testing the support of each sequence-extended and itemset-extended child. 2. **Pruning Techniques**: The algorithm uses Apriori-based pruning to minimize the search space, including S-step and I-step pruning. 3. **Vertical Bitmap Representation**: Efficient counting is achieved by partitioning the bitmap such that all transactions of a sequence appear together, allowing for quick support counting. 4. **Online Outputting**: SPAM incrementally outputs new frequent itemsets in an online fashion. Experimental results show that SPAM outperforms previous algorithms like SPADE and PrefixSpan by up to an order of magnitude, especially on large datasets. The performance improvement is attributed to the efficient counting process and the effective pruning techniques. However, SPAM is less space-efficient compared to SPADE due to its bitmap representation. The paper also discusses the trade-offs between space and time efficiency and suggests potential bitmap compression methods to address this issue.
Reach us at info@study.space