Package org.apache.spark.ml.fpm
Class PrefixSpan
Object
org.apache.spark.ml.fpm.PrefixSpan
- All Implemented Interfaces:
- Serializable,- PrefixSpanParams,- Params,- Identifiable
A parallel PrefixSpan algorithm to mine frequent sequential patterns.
 The PrefixSpan algorithm is described in J. Pei, et al., PrefixSpan: Mining Sequential Patterns
 Efficiently by Prefix-Projected Pattern Growth
 (see here).
 This class is not yet an Estimator/Transformer, use 
findFrequentSequentialPatterns method to
 run the PrefixSpan algorithm.
 - 
Constructor SummaryConstructors
- 
Method SummaryModifier and TypeMethodDescriptionCreates a copy of this instance with the same UID and some extra params.findFrequentSequentialPatterns(Dataset<?> dataset) Finds the complete set of frequent sequential patterns in the input sequences of itemsets.Param for the maximum number of items (including delimiters used in the internal storage format) allowed in a projected database before local processing (default:32000000).Param for the maximal pattern length (default:10).Param for the minimal support level (default:0.1).Param<?>[]params()Returns all params sorted by their names.Param for the name of the sequence column in dataset (default "sequence"), rows with nulls in this column are ignored.setMaxLocalProjDBSize(long value) setMaxPatternLength(int value) setMinSupport(double value) setSequenceCol(String value) uid()An immutable unique ID for the object and its derivatives.Methods inherited from class java.lang.Objectequals, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitMethods inherited from interface org.apache.spark.ml.util.IdentifiabletoStringMethods inherited from interface org.apache.spark.ml.param.Paramsclear, copyValues, defaultCopy, estimateMatadataSize, explainParam, explainParams, extractParamMap, extractParamMap, get, getDefault, getOrDefault, getParam, hasDefault, hasParam, isDefined, isSet, onParamChange, set, set, set, setDefault, setDefault, shouldOwnMethods inherited from interface org.apache.spark.ml.fpm.PrefixSpanParamsgetMaxLocalProjDBSize, getMaxPatternLength, getMinSupport, getSequenceCol
- 
Constructor Details- 
PrefixSpan
- 
PrefixSpanpublic PrefixSpan()
 
- 
- 
Method Details- 
copyDescription copied from interface:ParamsCreates a copy of this instance with the same UID and some extra params. Subclasses should implement this method and set the return type properly. SeedefaultCopy().
- 
findFrequentSequentialPatternsFinds the complete set of frequent sequential patterns in the input sequences of itemsets.- Parameters:
- dataset- A dataset or a dataframe containing a sequence column which is
 type, T is the item type for the input dataset. @return A `DataFrame` that contains columns of sequence and corresponding frequency. The schema of it will be: - `sequence: ArrayType(ArrayType(T))` (T is the item type) - `freq: Long`- ArrayType(ArrayType(T))
- Returns:
- (undocumented)
 
- 
maxLocalProjDBSizeDescription copied from interface:PrefixSpanParamsParam for the maximum number of items (including delimiters used in the internal storage format) allowed in a projected database before local processing (default:32000000). If a projected database exceeds this size, another iteration of distributed prefix growth is run.- Specified by:
- maxLocalProjDBSizein interface- PrefixSpanParams
- Returns:
- (undocumented)
 
- 
maxPatternLengthDescription copied from interface:PrefixSpanParamsParam for the maximal pattern length (default:10).- Specified by:
- maxPatternLengthin interface- PrefixSpanParams
- Returns:
- (undocumented)
 
- 
minSupportDescription copied from interface:PrefixSpanParamsParam for the minimal support level (default:0.1). Sequential patterns that appear more than (minSupport * size-of-the-dataset) times are identified as frequent sequential patterns.- Specified by:
- minSupportin interface- PrefixSpanParams
- Returns:
- (undocumented)
 
- 
paramsDescription copied from interface:ParamsReturns all params sorted by their names. The default implementation uses Java reflection to list all public methods that have no arguments and returnParam.
- 
sequenceColDescription copied from interface:PrefixSpanParamsParam for the name of the sequence column in dataset (default "sequence"), rows with nulls in this column are ignored.- Specified by:
- sequenceColin interface- PrefixSpanParams
- Returns:
- (undocumented)
 
- 
setMaxLocalProjDBSize
- 
setMaxPatternLength
- 
setMinSupport
- 
setSequenceCol
- 
uidDescription copied from interface:IdentifiableAn immutable unique ID for the object and its derivatives.- Specified by:
- uidin interface- Identifiable
- Returns:
- (undocumented)
 
 
-