Class PrefixSpan

Object
org.apache.spark.mllib.fpm.PrefixSpan
All Implemented Interfaces:
Serializable, org.apache.spark.internal.Logging

public class PrefixSpan extends Object implements org.apache.spark.internal.Logging, Serializable
A parallel PrefixSpan algorithm to mine frequent sequential patterns. The PrefixSpan algorithm is described in J. Pei, et al., PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth (see here).

param: minSupport the minimal support level of the sequential pattern, any pattern that appears more than (minSupport * size-of-the-dataset) times will be output param: maxPatternLength the maximal length of the sequential pattern param: maxLocalProjDBSize The maximum number of items (including delimiters used in the internal storage format) allowed in a projected database before local processing. If a projected database exceeds this size, another iteration of distributed prefix growth is run.

See Also:
  • Nested Class Summary

    Nested Classes
    Modifier and Type
    Class
    Description
    static class 
    Represents a frequent sequence.
    static class 
     
    static class 
     

    Nested classes/interfaces inherited from interface org.apache.spark.internal.Logging

    org.apache.spark.internal.Logging.LogStringContext, org.apache.spark.internal.Logging.SparkShellLoggingFilter
  • Constructor Summary

    Constructors
    Constructor
    Description
    Constructs a default instance with default parameters {minSupport: 0.1, maxPatternLength: 10, maxLocalProjDBSize: 32000000L}.
  • Method Summary

    Modifier and Type
    Method
    Description
    long
    Gets the maximum number of items allowed in a projected database before local processing.
    int
    Gets the maximal pattern length (i.e.
    double
    Get the minimal support (i.e.
    static org.apache.spark.internal.Logging.LogStringContext
    LogStringContext(scala.StringContext sc)
     
    static org.slf4j.Logger
     
    static void
     
    <Item, Itemset extends Iterable<Item>, Sequence extends Iterable<Itemset>>
    PrefixSpanModel<Item>
    run(JavaRDD<Sequence> data)
    A Java-friendly version of run() that reads sequences from a JavaRDD and returns frequent sequences in a PrefixSpanModel.
    <Item> PrefixSpanModel<Item>
    run(RDD<Object[]> data, scala.reflect.ClassTag<Item> evidence$1)
    Finds the complete set of frequent sequential patterns in the input sequences of itemsets.
    setMaxLocalProjDBSize(long maxLocalProjDBSize)
    Sets the maximum number of items (including delimiters used in the internal storage format) allowed in a projected database before local processing (default: 32000000L).
    setMaxPatternLength(int maxPatternLength)
    Sets maximal pattern length (default: 10).
    setMinSupport(double minSupport)
    Sets the minimal support level (default: 0.1).

    Methods inherited from class java.lang.Object

    equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

    Methods inherited from interface org.apache.spark.internal.Logging

    initializeForcefully, initializeLogIfNecessary, initializeLogIfNecessary, initializeLogIfNecessary$default$2, isTraceEnabled, log, logDebug, logDebug, logDebug, logDebug, logError, logError, logError, logError, logInfo, logInfo, logInfo, logInfo, logName, LogStringContext, logTrace, logTrace, logTrace, logTrace, logWarning, logWarning, logWarning, logWarning, org$apache$spark$internal$Logging$$log_, org$apache$spark$internal$Logging$$log__$eq, withLogContext
  • Constructor Details

    • PrefixSpan

      public PrefixSpan()
      Constructs a default instance with default parameters {minSupport: 0.1, maxPatternLength: 10, maxLocalProjDBSize: 32000000L}.
  • Method Details

    • org$apache$spark$internal$Logging$$log_

      public static org.slf4j.Logger org$apache$spark$internal$Logging$$log_()
    • org$apache$spark$internal$Logging$$log__$eq

      public static void org$apache$spark$internal$Logging$$log__$eq(org.slf4j.Logger x$1)
    • LogStringContext

      public static org.apache.spark.internal.Logging.LogStringContext LogStringContext(scala.StringContext sc)
    • getMinSupport

      public double getMinSupport()
      Get the minimal support (i.e. the frequency of occurrence before a pattern is considered frequent).
      Returns:
      (undocumented)
    • setMinSupport

      public PrefixSpan setMinSupport(double minSupport)
      Sets the minimal support level (default: 0.1).
      Parameters:
      minSupport - (undocumented)
      Returns:
      (undocumented)
    • getMaxPatternLength

      public int getMaxPatternLength()
      Gets the maximal pattern length (i.e. the length of the longest sequential pattern to consider.
      Returns:
      (undocumented)
    • setMaxPatternLength

      public PrefixSpan setMaxPatternLength(int maxPatternLength)
      Sets maximal pattern length (default: 10).
      Parameters:
      maxPatternLength - (undocumented)
      Returns:
      (undocumented)
    • getMaxLocalProjDBSize

      public long getMaxLocalProjDBSize()
      Gets the maximum number of items allowed in a projected database before local processing.
      Returns:
      (undocumented)
    • setMaxLocalProjDBSize

      public PrefixSpan setMaxLocalProjDBSize(long maxLocalProjDBSize)
      Sets the maximum number of items (including delimiters used in the internal storage format) allowed in a projected database before local processing (default: 32000000L).
      Parameters:
      maxLocalProjDBSize - (undocumented)
      Returns:
      (undocumented)
    • run

      public <Item> PrefixSpanModel<Item> run(RDD<Object[]> data, scala.reflect.ClassTag<Item> evidence$1)
      Finds the complete set of frequent sequential patterns in the input sequences of itemsets.
      Parameters:
      data - sequences of itemsets.
      evidence$1 - (undocumented)
      Returns:
      a PrefixSpanModel that contains the frequent patterns
    • run

      public <Item, Itemset extends Iterable<Item>, Sequence extends Iterable<Itemset>> PrefixSpanModel<Item> run(JavaRDD<Sequence> data)
      A Java-friendly version of run() that reads sequences from a JavaRDD and returns frequent sequences in a PrefixSpanModel.
      Parameters:
      data - ordered sequences of itemsets stored as Java Iterable of Iterables
      Returns:
      a PrefixSpanModel that contains the frequent sequential patterns