Package org.apache.spark.mllib.fpm
Class PrefixSpan
Object
org.apache.spark.mllib.fpm.PrefixSpan
- All Implemented Interfaces:
- Serializable,- org.apache.spark.internal.Logging
A parallel PrefixSpan algorithm to mine frequent sequential patterns.
 The PrefixSpan algorithm is described in J. Pei, et al., PrefixSpan: Mining Sequential Patterns
 Efficiently by Prefix-Projected Pattern Growth
 (see here).
 
param: minSupport the minimal support level of the sequential pattern, any pattern that appears more than (minSupport * size-of-the-dataset) times will be output param: maxPatternLength the maximal length of the sequential pattern param: maxLocalProjDBSize The maximum number of items (including delimiters used in the internal storage format) allowed in a projected database before local processing. If a projected database exceeds this size, another iteration of distributed prefix growth is run.
- 
Nested Class SummaryNested ClassesModifier and TypeClassDescriptionstatic classRepresents a frequent sequence.static classstatic classNested classes/interfaces inherited from interface org.apache.spark.internal.Loggingorg.apache.spark.internal.Logging.LogStringContext, org.apache.spark.internal.Logging.SparkShellLoggingFilter
- 
Constructor SummaryConstructorsConstructorDescriptionConstructs a default instance with default parameters {minSupport:0.1, maxPatternLength:10, maxLocalProjDBSize:32000000L}.
- 
Method SummaryModifier and TypeMethodDescriptionlongGets the maximum number of items allowed in a projected database before local processing.intGets the maximal pattern length (i.e.doubleGet the minimal support (i.e.static org.apache.spark.internal.Logging.LogStringContextLogStringContext(scala.StringContext sc) static org.slf4j.Loggerstatic voidorg$apache$spark$internal$Logging$$log__$eq(org.slf4j.Logger x$1) <Item,Itemset extends Iterable<Item>, Sequence extends Iterable<Itemset>> 
 PrefixSpanModel<Item>A Java-friendly version ofrun()that reads sequences from aJavaRDDand returns frequent sequences in aPrefixSpanModel.<Item> PrefixSpanModel<Item>Finds the complete set of frequent sequential patterns in the input sequences of itemsets.setMaxLocalProjDBSize(long maxLocalProjDBSize) Sets the maximum number of items (including delimiters used in the internal storage format) allowed in a projected database before local processing (default:32000000L).setMaxPatternLength(int maxPatternLength) Sets maximal pattern length (default:10).setMinSupport(double minSupport) Sets the minimal support level (default:0.1).Methods inherited from class java.lang.Objectequals, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitMethods inherited from interface org.apache.spark.internal.LogginginitializeForcefully, initializeLogIfNecessary, initializeLogIfNecessary, initializeLogIfNecessary$default$2, isTraceEnabled, log, logBasedOnLevel, logDebug, logDebug, logDebug, logDebug, logError, logError, logError, logError, logInfo, logInfo, logInfo, logInfo, logName, LogStringContext, logTrace, logTrace, logTrace, logTrace, logWarning, logWarning, logWarning, logWarning, MDC, org$apache$spark$internal$Logging$$log_, org$apache$spark$internal$Logging$$log__$eq, withLogContext
- 
Constructor Details- 
PrefixSpanpublic PrefixSpan()Constructs a default instance with default parameters {minSupport:0.1, maxPatternLength:10, maxLocalProjDBSize:32000000L}.
 
- 
- 
Method Details- 
org$apache$spark$internal$Logging$$log_public static org.slf4j.Logger org$apache$spark$internal$Logging$$log_()
- 
org$apache$spark$internal$Logging$$log__$eqpublic static void org$apache$spark$internal$Logging$$log__$eq(org.slf4j.Logger x$1) 
- 
LogStringContextpublic static org.apache.spark.internal.Logging.LogStringContext LogStringContext(scala.StringContext sc) 
- 
getMinSupportpublic double getMinSupport()Get the minimal support (i.e. the frequency of occurrence before a pattern is considered frequent).- Returns:
- (undocumented)
 
- 
setMinSupportSets the minimal support level (default:0.1).- Parameters:
- minSupport- (undocumented)
- Returns:
- (undocumented)
 
- 
getMaxPatternLengthpublic int getMaxPatternLength()Gets the maximal pattern length (i.e. the length of the longest sequential pattern to consider.- Returns:
- (undocumented)
 
- 
setMaxPatternLengthSets maximal pattern length (default:10).- Parameters:
- maxPatternLength- (undocumented)
- Returns:
- (undocumented)
 
- 
getMaxLocalProjDBSizepublic long getMaxLocalProjDBSize()Gets the maximum number of items allowed in a projected database before local processing.- Returns:
- (undocumented)
 
- 
setMaxLocalProjDBSizeSets the maximum number of items (including delimiters used in the internal storage format) allowed in a projected database before local processing (default:32000000L).- Parameters:
- maxLocalProjDBSize- (undocumented)
- Returns:
- (undocumented)
 
- 
runpublic <Item> PrefixSpanModel<Item> run(RDD<Object[]> data, scala.reflect.ClassTag<Item> evidence$1) Finds the complete set of frequent sequential patterns in the input sequences of itemsets.- Parameters:
- data- sequences of itemsets.
- evidence$1- (undocumented)
- Returns:
- a PrefixSpanModelthat contains the frequent patterns
 
- 
runpublic <Item,Itemset extends Iterable<Item>, PrefixSpanModel<Item> runSequence extends Iterable<Itemset>> (JavaRDD<Sequence> data) A Java-friendly version ofrun()that reads sequences from aJavaRDDand returns frequent sequences in aPrefixSpanModel.- Parameters:
- data- ordered sequences of itemsets stored as Java Iterable of Iterables
- Returns:
- a PrefixSpanModelthat contains the frequent sequential patterns
 
 
-