31 Mar 2024 | Kashob Kumar Roy, Md Hasibul Haque Moon, Md Mahmudur Rahman, Chowdhury Farhan Ahmed, Carson K. Leung
This paper addresses the challenge of mining sequential patterns in uncertain databases, which are prevalent in modern applications due to data uncertainty. The authors propose a novel framework that includes multiple theoretically tightened pruning upper bounds to reduce the search space for mining potential candidate patterns. They introduce a hierarchical index structure, *USeq-Trie*, to efficiently maintain these patterns and develop an efficient method, *SupCalc*, for calculating expected support. The framework also includes an algorithm, *FUSP*, for mining weighted sequential patterns and an incremental mining approach, *InUSP*, for handling dynamic databases. Extensive experiments on real-life datasets demonstrate the effectiveness and efficiency of the proposed methods, showing superior performance compared to existing techniques in terms of false-positive generation, runtime, and completeness. The paper concludes by highlighting the potential of the proposed techniques in various real-life applications and the broader implications for related research areas.This paper addresses the challenge of mining sequential patterns in uncertain databases, which are prevalent in modern applications due to data uncertainty. The authors propose a novel framework that includes multiple theoretically tightened pruning upper bounds to reduce the search space for mining potential candidate patterns. They introduce a hierarchical index structure, *USeq-Trie*, to efficiently maintain these patterns and develop an efficient method, *SupCalc*, for calculating expected support. The framework also includes an algorithm, *FUSP*, for mining weighted sequential patterns and an incremental mining approach, *InUSP*, for handling dynamic databases. Extensive experiments on real-life datasets demonstrate the effectiveness and efficiency of the proposed methods, showing superior performance compared to existing techniques in terms of false-positive generation, runtime, and completeness. The paper concludes by highlighting the potential of the proposed techniques in various real-life applications and the broader implications for related research areas.