PrefixSpan
PrefixSpan is a sequential pattern mining algorithm described in Pei et al., Mining Sequential Patterns by Pattern-Growth: The PrefixSpan Approach. We refer the reader to the referenced paper for formalizing the sequential pattern mining problem.
spark.ml's PrefixSpan implementation takes the following parameters:
minSupport: the minimum support required to be considered a frequent sequential pattern.
maxPatternLength: the maximum length of a frequent sequential pattern. Any frequent pattern exceeding this length will not be included in the results.
maxLocalProjDBSize: the maximum number of items allowed in a prefix-projected database before local iterative processing of the projected database begins. This parameter should be tuned with respect to the size of your executors.
sequenceCol: the name of the sequence column in dataset (default "sequence"), rows with nulls in this column are ignored.
Last updated