Class FPGrowth

Object
org.apache.spark.mllib.fpm.FPGrowth
All Implemented Interfaces:
Serializable, org.apache.spark.internal.Logging

public class FPGrowth extends Object implements org.apache.spark.internal.Logging, Serializable
A parallel FP-growth algorithm to mine frequent itemsets. The algorithm is described in Li et al., PFP: Parallel FP-Growth for Query Recommendation. PFP distributes computation in such a way that each worker executes an independent group of mining tasks. The FP-Growth algorithm is described in Han et al., Mining frequent patterns without candidate generation.

param: minSupport the minimal support level of the frequent pattern, any pattern that appears more than (minSupport * size-of-the-dataset) times will be output param: numPartitions number of partitions used by parallel FP-growth

See Also:
  • Nested Class Summary

    Nested Classes
    Modifier and Type
    Class
    Description
    static class 
    Frequent itemset.

    Nested classes/interfaces inherited from interface org.apache.spark.internal.Logging

    org.apache.spark.internal.Logging.LogStringContext, org.apache.spark.internal.Logging.SparkShellLoggingFilter
  • Constructor Summary

    Constructors
    Constructor
    Description
    Constructs a default instance with default parameters {minSupport: 0.3, numPartitions: same as the input data}.
  • Method Summary

    Modifier and Type
    Method
    Description
    <Item, Basket extends Iterable<Item>>
    FPGrowthModel<Item>
    run(JavaRDD<Basket> data)
    Java-friendly version of run.
    <Item> FPGrowthModel<Item>
    run(RDD<Object> data, scala.reflect.ClassTag<Item> evidence$4)
    Computes an FP-Growth model that contains frequent itemsets.
    setMinSupport(double minSupport)
    Sets the minimal support level (default: 0.3).
    setNumPartitions(int numPartitions)
    Sets the number of partitions used by parallel FP-growth (default: same as input data).

    Methods inherited from class java.lang.Object

    equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

    Methods inherited from interface org.apache.spark.internal.Logging

    initializeForcefully, initializeLogIfNecessary, initializeLogIfNecessary, initializeLogIfNecessary$default$2, isTraceEnabled, log, logDebug, logDebug, logDebug, logDebug, logError, logError, logError, logError, logInfo, logInfo, logInfo, logInfo, logName, LogStringContext, logTrace, logTrace, logTrace, logTrace, logWarning, logWarning, logWarning, logWarning, org$apache$spark$internal$Logging$$log_, org$apache$spark$internal$Logging$$log__$eq, withLogContext
  • Constructor Details

    • FPGrowth

      public FPGrowth()
      Constructs a default instance with default parameters {minSupport: 0.3, numPartitions: same as the input data}.

  • Method Details

    • setMinSupport

      public FPGrowth setMinSupport(double minSupport)
      Sets the minimal support level (default: 0.3).

      Parameters:
      minSupport - (undocumented)
      Returns:
      (undocumented)
    • setNumPartitions

      public FPGrowth setNumPartitions(int numPartitions)
      Sets the number of partitions used by parallel FP-growth (default: same as input data).

      Parameters:
      numPartitions - (undocumented)
      Returns:
      (undocumented)
    • run

      public <Item> FPGrowthModel<Item> run(RDD<Object> data, scala.reflect.ClassTag<Item> evidence$4)
      Computes an FP-Growth model that contains frequent itemsets.
      Parameters:
      data - input data set, each element contains a transaction
      evidence$4 - (undocumented)
      Returns:
      an FPGrowthModel

    • run

      public <Item, Basket extends Iterable<Item>> FPGrowthModel<Item> run(JavaRDD<Basket> data)
      Java-friendly version of run.
      Parameters:
      data - (undocumented)
      Returns:
      (undocumented)