public final class PrefixSpan extends Object implements Params
findFrequentSequentialPatterns
method to
run the PrefixSpan algorithm.
Constructor and Description |
---|
PrefixSpan() |
PrefixSpan(String uid) |
Modifier and Type | Method and Description |
---|---|
PrefixSpan |
copy(ParamMap extra)
Creates a copy of this instance with the same UID and some extra params.
|
Dataset<Row> |
findFrequentSequentialPatterns(Dataset<?> dataset)
Finds the complete set of frequent sequential patterns in the input sequences of itemsets.
|
long |
getMaxLocalProjDBSize() |
int |
getMaxPatternLength() |
double |
getMinSupport() |
String |
getSequenceCol() |
LongParam |
maxLocalProjDBSize()
Param for the maximum number of items (including delimiters used in the internal storage
format) allowed in a projected database before local processing (default:
32000000 ). |
IntParam |
maxPatternLength()
Param for the maximal pattern length (default:
10 ). |
DoubleParam |
minSupport()
Param for the minimal support level (default:
0.1 ). |
Param<?>[] |
params()
Returns all params sorted by their names.
|
Param<String> |
sequenceCol()
Param for the name of the sequence column in dataset (default "sequence"), rows with
nulls in this column are ignored.
|
PrefixSpan |
setMaxLocalProjDBSize(long value) |
PrefixSpan |
setMaxPatternLength(int value) |
PrefixSpan |
setMinSupport(double value) |
PrefixSpan |
setSequenceCol(String value) |
String |
uid()
An immutable unique ID for the object and its derivatives.
|
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
clear, copyValues, defaultCopy, explainParam, explainParams, extractParamMap, extractParamMap, get, getDefault, getOrDefault, getParam, hasDefault, hasParam, isDefined, isSet, onParamChange, set, set, set, setDefault, setDefault, shouldOwn
toString
public PrefixSpan copy(ParamMap extra)
Params
defaultCopy()
.public Dataset<Row> findFrequentSequentialPatterns(Dataset<?> dataset)
dataset
- A dataset or a dataframe containing a sequence column which is
ArrayType(ArrayType(T))
type, T is the item type for the input dataset.
@return A `DataFrame` that contains columns of sequence and corresponding frequency.
The schema of it will be:
- `sequence: ArrayType(ArrayType(T))` (T is the item type)
- `freq: Long`public long getMaxLocalProjDBSize()
public int getMaxPatternLength()
public double getMinSupport()
public String getSequenceCol()
public LongParam maxLocalProjDBSize()
32000000
).
If a projected database exceeds this size, another iteration of distributed prefix growth
is run.public IntParam maxPatternLength()
10
).public DoubleParam minSupport()
0.1
).
Sequential patterns that appear more than (minSupport * size-of-the-dataset) times are
identified as frequent sequential patterns.public Param<?>[] params()
Params
Param
.
public Param<String> sequenceCol()
public PrefixSpan setMaxLocalProjDBSize(long value)
public PrefixSpan setMaxPatternLength(int value)
public PrefixSpan setMinSupport(double value)
public PrefixSpan setSequenceCol(String value)
public String uid()
Identifiable
uid
in interface Identifiable