The PrefixSpan Approach to Mining Sequential Patterns by Pattern Growth
Keywords:
sequential pattern, frequent pattern, sequence database, scalability, performance analysis, transaction database.Abstract
The PrefixSpan approach to mining sequential patterns is a pattern growth method in which a pattern is extended one item at a time. The data is first scanned, and frequent patterns are identified. Then, the identified patterns are extended to discover longer patterns. The PrefixSpan algorithm uses a prefix tree to store the candidate patterns, which are then extended by adding one item at a time. The process is repeated until no more patterns can be found. The PrefixSpan approach is useful in uncovering complex patterns in large datasets, such as those found in biological and logistic processes. This approach is also suitable for streaming data, as the patterns can be updated in real time. Additionally, the prefix tree can be used to efficiently store and access the patterns and their extensions. Sequential pattern mining is a significant data mining problem with a wide range of applications. It is, however, a difficult problem because mining may have to generate or examine a combinatorially explosive number of intermediate subsequences. To reduce the number of candidates to be examined, most previously developed sequential pattern mining methods, such as GSP, employ a candidate generation-and-test approach. However, this method may be inefficient when mining large sequence databases with many patterns and/or long patterns. In this paper, we propose a sequential pattern-growth approach based on projections for efficient mining of sequential patterns.