Package org.apache.spark.mllib.fpm
Class PrefixSpan
Object
org.apache.spark.mllib.fpm.PrefixSpan
- All Implemented Interfaces:
Serializable
,org.apache.spark.internal.Logging
A parallel PrefixSpan algorithm to mine frequent sequential patterns.
The PrefixSpan algorithm is described in J. Pei, et al., PrefixSpan: Mining Sequential Patterns
Efficiently by Prefix-Projected Pattern Growth
(see here).
param: minSupport the minimal support level of the sequential pattern, any pattern that appears more than (minSupport * size-of-the-dataset) times will be output param: maxPatternLength the maximal length of the sequential pattern param: maxLocalProjDBSize The maximum number of items (including delimiters used in the internal storage format) allowed in a projected database before local processing. If a projected database exceeds this size, another iteration of distributed prefix growth is run.
-
Nested Class Summary
Modifier and TypeClassDescriptionstatic class
Represents a frequent sequence.static class
static class
Nested classes/interfaces inherited from interface org.apache.spark.internal.Logging
org.apache.spark.internal.Logging.LogStringContext, org.apache.spark.internal.Logging.SparkShellLoggingFilter
-
Constructor Summary
ConstructorDescriptionConstructs a default instance with default parameters {minSupport:0.1
, maxPatternLength:10
, maxLocalProjDBSize:32000000L
}. -
Method Summary
Modifier and TypeMethodDescriptionlong
Gets the maximum number of items allowed in a projected database before local processing.int
Gets the maximal pattern length (i.e.double
Get the minimal support (i.e.static org.apache.spark.internal.Logging.LogStringContext
LogStringContext
(scala.StringContext sc) static org.slf4j.Logger
static void
org$apache$spark$internal$Logging$$log__$eq
(org.slf4j.Logger x$1) <Item,
Itemset extends Iterable<Item>, Sequence extends Iterable<Itemset>>
PrefixSpanModel<Item>A Java-friendly version ofrun()
that reads sequences from aJavaRDD
and returns frequent sequences in aPrefixSpanModel
.<Item> PrefixSpanModel<Item>
Finds the complete set of frequent sequential patterns in the input sequences of itemsets.setMaxLocalProjDBSize
(long maxLocalProjDBSize) Sets the maximum number of items (including delimiters used in the internal storage format) allowed in a projected database before local processing (default:32000000L
).setMaxPatternLength
(int maxPatternLength) Sets maximal pattern length (default:10
).setMinSupport
(double minSupport) Sets the minimal support level (default:0.1
).Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
Methods inherited from interface org.apache.spark.internal.Logging
initializeForcefully, initializeLogIfNecessary, initializeLogIfNecessary, initializeLogIfNecessary$default$2, isTraceEnabled, log, logDebug, logDebug, logDebug, logDebug, logError, logError, logError, logError, logInfo, logInfo, logInfo, logInfo, logName, LogStringContext, logTrace, logTrace, logTrace, logTrace, logWarning, logWarning, logWarning, logWarning, org$apache$spark$internal$Logging$$log_, org$apache$spark$internal$Logging$$log__$eq, withLogContext
-
Constructor Details
-
PrefixSpan
public PrefixSpan()Constructs a default instance with default parameters {minSupport:0.1
, maxPatternLength:10
, maxLocalProjDBSize:32000000L
}.
-
-
Method Details
-
org$apache$spark$internal$Logging$$log_
public static org.slf4j.Logger org$apache$spark$internal$Logging$$log_() -
org$apache$spark$internal$Logging$$log__$eq
public static void org$apache$spark$internal$Logging$$log__$eq(org.slf4j.Logger x$1) -
LogStringContext
public static org.apache.spark.internal.Logging.LogStringContext LogStringContext(scala.StringContext sc) -
getMinSupport
public double getMinSupport()Get the minimal support (i.e. the frequency of occurrence before a pattern is considered frequent).- Returns:
- (undocumented)
-
setMinSupport
Sets the minimal support level (default:0.1
).- Parameters:
minSupport
- (undocumented)- Returns:
- (undocumented)
-
getMaxPatternLength
public int getMaxPatternLength()Gets the maximal pattern length (i.e. the length of the longest sequential pattern to consider.- Returns:
- (undocumented)
-
setMaxPatternLength
Sets maximal pattern length (default:10
).- Parameters:
maxPatternLength
- (undocumented)- Returns:
- (undocumented)
-
getMaxLocalProjDBSize
public long getMaxLocalProjDBSize()Gets the maximum number of items allowed in a projected database before local processing.- Returns:
- (undocumented)
-
setMaxLocalProjDBSize
Sets the maximum number of items (including delimiters used in the internal storage format) allowed in a projected database before local processing (default:32000000L
).- Parameters:
maxLocalProjDBSize
- (undocumented)- Returns:
- (undocumented)
-
run
public <Item> PrefixSpanModel<Item> run(RDD<Object[]> data, scala.reflect.ClassTag<Item> evidence$1) Finds the complete set of frequent sequential patterns in the input sequences of itemsets.- Parameters:
data
- sequences of itemsets.evidence$1
- (undocumented)- Returns:
- a
PrefixSpanModel
that contains the frequent patterns
-
run
public <Item,Itemset extends Iterable<Item>, PrefixSpanModel<Item> runSequence extends Iterable<Itemset>> (JavaRDD<Sequence> data) A Java-friendly version ofrun()
that reads sequences from aJavaRDD
and returns frequent sequences in aPrefixSpanModel
.- Parameters:
data
- ordered sequences of itemsets stored as Java Iterable of Iterables- Returns:
- a
PrefixSpanModel
that contains the frequent sequential patterns
-