Package org.apache.spark.ml.fpm
Class PrefixSpan
Object
org.apache.spark.ml.fpm.PrefixSpan
- All Implemented Interfaces:
Serializable,PrefixSpanParams,Params,Identifiable
A parallel PrefixSpan algorithm to mine frequent sequential patterns.
The PrefixSpan algorithm is described in J. Pei, et al., PrefixSpan: Mining Sequential Patterns
Efficiently by Prefix-Projected Pattern Growth
(see here).
This class is not yet an Estimator/Transformer, use
findFrequentSequentialPatterns method to
run the PrefixSpan algorithm.
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionCreates a copy of this instance with the same UID and some extra params.findFrequentSequentialPatterns(Dataset<?> dataset) Finds the complete set of frequent sequential patterns in the input sequences of itemsets.Param for the maximum number of items (including delimiters used in the internal storage format) allowed in a projected database before local processing (default:32000000).Param for the maximal pattern length (default:10).Param for the minimal support level (default:0.1).Param<?>[]params()Returns all params sorted by their names.Param for the name of the sequence column in dataset (default "sequence"), rows with nulls in this column are ignored.setMaxLocalProjDBSize(long value) setMaxPatternLength(int value) setMinSupport(double value) setSequenceCol(String value) uid()An immutable unique ID for the object and its derivatives.Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitMethods inherited from interface org.apache.spark.ml.util.Identifiable
toStringMethods inherited from interface org.apache.spark.ml.param.Params
clear, copyValues, defaultCopy, explainParam, explainParams, extractParamMap, extractParamMap, get, getDefault, getOrDefault, getParam, hasDefault, hasParam, isDefined, isSet, onParamChange, set, set, set, setDefault, setDefault, shouldOwnMethods inherited from interface org.apache.spark.ml.fpm.PrefixSpanParams
getMaxLocalProjDBSize, getMaxPatternLength, getMinSupport, getSequenceCol
-
Constructor Details
-
PrefixSpan
-
PrefixSpan
public PrefixSpan()
-
-
Method Details
-
copy
Description copied from interface:ParamsCreates a copy of this instance with the same UID and some extra params. Subclasses should implement this method and set the return type properly. SeedefaultCopy(). -
findFrequentSequentialPatterns
Finds the complete set of frequent sequential patterns in the input sequences of itemsets.- Parameters:
dataset- A dataset or a dataframe containing a sequence column which is
type, T is the item type for the input dataset. @return A `DataFrame` that contains columns of sequence and corresponding frequency. The schema of it will be: - `sequence: ArrayType(ArrayType(T))` (T is the item type) - `freq: Long`ArrayType(ArrayType(T))- Returns:
- (undocumented)
-
maxLocalProjDBSize
Description copied from interface:PrefixSpanParamsParam for the maximum number of items (including delimiters used in the internal storage format) allowed in a projected database before local processing (default:32000000). If a projected database exceeds this size, another iteration of distributed prefix growth is run.- Specified by:
maxLocalProjDBSizein interfacePrefixSpanParams- Returns:
- (undocumented)
-
maxPatternLength
Description copied from interface:PrefixSpanParamsParam for the maximal pattern length (default:10).- Specified by:
maxPatternLengthin interfacePrefixSpanParams- Returns:
- (undocumented)
-
minSupport
Description copied from interface:PrefixSpanParamsParam for the minimal support level (default:0.1). Sequential patterns that appear more than (minSupport * size-of-the-dataset) times are identified as frequent sequential patterns.- Specified by:
minSupportin interfacePrefixSpanParams- Returns:
- (undocumented)
-
params
Description copied from interface:ParamsReturns all params sorted by their names. The default implementation uses Java reflection to list all public methods that have no arguments and returnParam. -
sequenceCol
Description copied from interface:PrefixSpanParamsParam for the name of the sequence column in dataset (default "sequence"), rows with nulls in this column are ignored.- Specified by:
sequenceColin interfacePrefixSpanParams- Returns:
- (undocumented)
-
setMaxLocalProjDBSize
-
setMaxPatternLength
-
setMinSupport
-
setSequenceCol
-
uid
Description copied from interface:IdentifiableAn immutable unique ID for the object and its derivatives.- Specified by:
uidin interfaceIdentifiable- Returns:
- (undocumented)
-