Package org.apache.spark.ml.fpm
Class PrefixSpan
Object
org.apache.spark.ml.fpm.PrefixSpan
- All Implemented Interfaces:
Serializable
,Params
,Identifiable
A parallel PrefixSpan algorithm to mine frequent sequential patterns.
The PrefixSpan algorithm is described in J. Pei, et al., PrefixSpan: Mining Sequential Patterns
Efficiently by Prefix-Projected Pattern Growth
(see here).
This class is not yet an Estimator/Transformer, use
findFrequentSequentialPatterns
method to
run the PrefixSpan algorithm.
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionCreates a copy of this instance with the same UID and some extra params.findFrequentSequentialPatterns
(Dataset<?> dataset) Finds the complete set of frequent sequential patterns in the input sequences of itemsets.long
int
double
Param for the maximum number of items (including delimiters used in the internal storage format) allowed in a projected database before local processing (default:32000000
).Param for the maximal pattern length (default:10
).Param for the minimal support level (default:0.1
).Param<?>[]
params()
Returns all params sorted by their names.Param for the name of the sequence column in dataset (default "sequence"), rows with nulls in this column are ignored.setMaxLocalProjDBSize
(long value) setMaxPatternLength
(int value) setMinSupport
(double value) setSequenceCol
(String value) uid()
An immutable unique ID for the object and its derivatives.Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
Methods inherited from interface org.apache.spark.ml.util.Identifiable
toString
Methods inherited from interface org.apache.spark.ml.param.Params
clear, copyValues, defaultCopy, explainParam, explainParams, extractParamMap, extractParamMap, get, getDefault, getOrDefault, getParam, hasDefault, hasParam, isDefined, isSet, onParamChange, set, set, set, setDefault, setDefault, shouldOwn
-
Constructor Details
-
PrefixSpan
-
PrefixSpan
public PrefixSpan()
-
-
Method Details
-
copy
Description copied from interface:Params
Creates a copy of this instance with the same UID and some extra params. Subclasses should implement this method and set the return type properly. SeedefaultCopy()
. -
findFrequentSequentialPatterns
Finds the complete set of frequent sequential patterns in the input sequences of itemsets.- Parameters:
dataset
- A dataset or a dataframe containing a sequence column which is
type, T is the item type for the input dataset. @return A `DataFrame` that contains columns of sequence and corresponding frequency. The schema of it will be: - `sequence: ArrayType(ArrayType(T))` (T is the item type) - `freq: Long`ArrayType(ArrayType(T))
- Returns:
- (undocumented)
-
getMaxLocalProjDBSize
public long getMaxLocalProjDBSize() -
getMaxPatternLength
public int getMaxPatternLength() -
getMinSupport
public double getMinSupport() -
getSequenceCol
-
maxLocalProjDBSize
Param for the maximum number of items (including delimiters used in the internal storage format) allowed in a projected database before local processing (default:32000000
). If a projected database exceeds this size, another iteration of distributed prefix growth is run.- Returns:
- (undocumented)
-
maxPatternLength
Param for the maximal pattern length (default:10
).- Returns:
- (undocumented)
-
minSupport
Param for the minimal support level (default:0.1
). Sequential patterns that appear more than (minSupport * size-of-the-dataset) times are identified as frequent sequential patterns.- Returns:
- (undocumented)
-
params
Description copied from interface:Params
Returns all params sorted by their names. The default implementation uses Java reflection to list all public methods that have no arguments and returnParam
. -
sequenceCol
Param for the name of the sequence column in dataset (default "sequence"), rows with nulls in this column are ignored.- Returns:
- (undocumented)
-
setMaxLocalProjDBSize
-
setMaxPatternLength
-
setMinSupport
-
setSequenceCol
-
uid
Description copied from interface:Identifiable
An immutable unique ID for the object and its derivatives.- Specified by:
uid
in interfaceIdentifiable
- Returns:
- (undocumented)
-