org.apache.spark.mllib.fpm
Class FPGrowth

Object
  extended by org.apache.spark.mllib.fpm.FPGrowth
All Implemented Interfaces:
java.io.Serializable, Logging

public class FPGrowth
extends Object
implements Logging, scala.Serializable

:: Experimental ::

A parallel FP-growth algorithm to mine frequent itemsets. The algorithm is described in Li et al., PFP: Parallel FP-Growth for Query Recommendation. PFP distributes computation in such a way that each worker executes an independent group of mining tasks. The FP-Growth algorithm is described in Han et al., Mining frequent patterns without candidate generation.

param: minSupport the minimal support level of the frequent pattern, any pattern appears more than (minSupport * size-of-the-dataset) times will be output param: numPartitions number of partitions used by parallel FP-growth

See Also:
http://en.wikipedia.org/wiki/Association_rule_learning Association rule learning (Wikipedia)}, Serialized Form

Nested Class Summary
static class FPGrowth.FreqItemset<Item>
          Frequent itemset.
 
Constructor Summary
FPGrowth()
          Constructs a default instance with default parameters {minSupport: 0.3, numPartitions: same as the input data}.
 
Method Summary
<Item,Basket extends Iterable<Item>>
FPGrowthModel<Item>
run(JavaRDD<Basket> data)
           
<Item> FPGrowthModel<Item>
run(RDD<Object> data, scala.reflect.ClassTag<Item> evidence$2)
          Computes an FP-Growth model that contains frequent itemsets.
 FPGrowth setMinSupport(double minSupport)
          Sets the minimal support level (default: 0.3).
 FPGrowth setNumPartitions(int numPartitions)
          Sets the number of partitions used by parallel FP-growth (default: same as input data).
 
Methods inherited from class Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface org.apache.spark.Logging
initializeIfNecessary, initializeLogging, isTraceEnabled, log_, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning
 

Constructor Detail

FPGrowth

public FPGrowth()
Constructs a default instance with default parameters {minSupport: 0.3, numPartitions: same as the input data}.

Method Detail

setMinSupport

public FPGrowth setMinSupport(double minSupport)
Sets the minimal support level (default: 0.3).

Parameters:
minSupport - (undocumented)
Returns:
(undocumented)

setNumPartitions

public FPGrowth setNumPartitions(int numPartitions)
Sets the number of partitions used by parallel FP-growth (default: same as input data).

Parameters:
numPartitions - (undocumented)
Returns:
(undocumented)

run

public <Item> FPGrowthModel<Item> run(RDD<Object> data,
                                      scala.reflect.ClassTag<Item> evidence$2)
Computes an FP-Growth model that contains frequent itemsets.

Parameters:
data - input data set, each element contains a transaction
evidence$2 - (undocumented)
Returns:
an FPGrowthModel

run

public <Item,Basket extends Iterable<Item>> FPGrowthModel<Item> run(JavaRDD<Basket> data)