org.apache.spark.mllib.tree.configuration
Class Strategy

Object
  extended by org.apache.spark.mllib.tree.configuration.Strategy
All Implemented Interfaces:
java.io.Serializable

public class Strategy
extends Object
implements scala.Serializable

:: Experimental :: Stores all the configuration options for tree construction param: algo Learning goal. Supported: org.apache.spark.mllib.tree.configuration.Algo.Classification, org.apache.spark.mllib.tree.configuration.Algo.Regression param: impurity Criterion used for information gain calculation. Supported for Classification: Gini, Entropy. Supported for Regression: Variance. param: maxDepth Maximum depth of the tree. E.g., depth 0 means 1 leaf node; depth 1 means 1 internal node + 2 leaf nodes. param: numClasses Number of classes for classification. (Ignored for regression.) Default value is 2 (binary classification). param: maxBins Maximum number of bins used for discretizing continuous features and for choosing how to split on features at each node. More bins give higher granularity. param: quantileCalculationStrategy Algorithm for calculating quantiles. Supported: org.apache.spark.mllib.tree.configuration.QuantileStrategy.Sort param: categoricalFeaturesInfo A map storing information about the categorical variables and the number of discrete values they take. For example, an entry (n -> k) implies the feature n is categorical with k categories 0, 1, 2, ... , k-1. It's important to note that features are zero-indexed. param: minInstancesPerNode Minimum number of instances each child must have after split. Default value is 1. If a split cause left or right child to have less than minInstancesPerNode, this split will not be considered as a valid split. param: minInfoGain Minimum information gain a split must get. Default value is 0.0. If a split has less information gain than minInfoGain, this split will not be considered as a valid split. param: maxMemoryInMB Maximum memory in MB allocated to histogram aggregation. Default value is 256 MB. param: subsamplingRate Fraction of the training data used for learning decision tree. param: useNodeIdCache If this is true, instead of passing trees to executors, the algorithm will maintain a separate RDD of node Id cache for each row. param: checkpointInterval How often to checkpoint when the node Id cache gets updated. E.g. 10 means that the cache will get checkpointed every 10 updates. If the checkpoint directory is not set in SparkContext, this setting is ignored.

See Also:
Serialized Form

Constructor Summary
Strategy(scala.Enumeration.Value algo, Impurity impurity, int maxDepth, int numClasses, int maxBins, scala.Enumeration.Value quantileCalculationStrategy, scala.collection.immutable.Map<Object,Object> categoricalFeaturesInfo, int minInstancesPerNode, double minInfoGain, int maxMemoryInMB, double subsamplingRate, boolean useNodeIdCache, int checkpointInterval)
           
Strategy(scala.Enumeration.Value algo, Impurity impurity, int maxDepth, int numClasses, int maxBins, java.util.Map<Integer,Integer> categoricalFeaturesInfo)
          Java-friendly constructor for Strategy
 
Method Summary
 scala.Enumeration.Value algo()
           
 scala.collection.immutable.Map<Object,Object> categoricalFeaturesInfo()
           
 int checkpointInterval()
           
 Strategy copy()
          Returns a shallow copy of this instance.
static Strategy defaultStategy(scala.Enumeration.Value algo)
          Construct a default set of parameters for DecisionTree
static Strategy defaultStrategy(String algo)
          Construct a default set of parameters for DecisionTree
 scala.Enumeration.Value getAlgo()
           
 scala.collection.immutable.Map<Object,Object> getCategoricalFeaturesInfo()
           
 int getCheckpointInterval()
           
 Impurity getImpurity()
           
 int getMaxBins()
           
 int getMaxDepth()
           
 int getMaxMemoryInMB()
           
 double getMinInfoGain()
           
 int getMinInstancesPerNode()
           
 int getNumClasses()
           
 scala.Enumeration.Value getQuantileCalculationStrategy()
           
 double getSubsamplingRate()
           
 boolean getUseNodeIdCache()
           
 Impurity impurity()
           
 boolean isMulticlassClassification()
           
 boolean isMulticlassWithCategoricalFeatures()
           
 int maxBins()
           
 int maxDepth()
           
 int maxMemoryInMB()
           
 double minInfoGain()
           
 int minInstancesPerNode()
           
 int numClasses()
           
 scala.Enumeration.Value quantileCalculationStrategy()
           
 void setAlgo(String algo)
          Sets Algorithm using a String.
 void setCategoricalFeaturesInfo(java.util.Map<Integer,Integer> categoricalFeaturesInfo)
          Sets categoricalFeaturesInfo using a Java Map.
 void setCheckpointInterval(int x$1)
           
 void setImpurity(Impurity x$1)
           
 void setMaxBins(int x$1)
           
 void setMaxDepth(int x$1)
           
 void setMaxMemoryInMB(int x$1)
           
 void setMinInfoGain(double x$1)
           
 void setMinInstancesPerNode(int x$1)
           
 void setNumClasses(int x$1)
           
 void setQuantileCalculationStrategy(scala.Enumeration.Value x$1)
           
 void setSubsamplingRate(double x$1)
           
 void setUseNodeIdCache(boolean x$1)
           
 double subsamplingRate()
           
 boolean useNodeIdCache()
           
 
Methods inherited from class Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

Strategy

public Strategy(scala.Enumeration.Value algo,
                Impurity impurity,
                int maxDepth,
                int numClasses,
                int maxBins,
                scala.Enumeration.Value quantileCalculationStrategy,
                scala.collection.immutable.Map<Object,Object> categoricalFeaturesInfo,
                int minInstancesPerNode,
                double minInfoGain,
                int maxMemoryInMB,
                double subsamplingRate,
                boolean useNodeIdCache,
                int checkpointInterval)

Strategy

public Strategy(scala.Enumeration.Value algo,
                Impurity impurity,
                int maxDepth,
                int numClasses,
                int maxBins,
                java.util.Map<Integer,Integer> categoricalFeaturesInfo)
Java-friendly constructor for Strategy

Parameters:
algo - (undocumented)
impurity - (undocumented)
maxDepth - (undocumented)
numClasses - (undocumented)
maxBins - (undocumented)
categoricalFeaturesInfo - (undocumented)
Method Detail

defaultStrategy

public static Strategy defaultStrategy(String algo)
Construct a default set of parameters for DecisionTree

Parameters:
algo - "Classification" or "Regression"
Returns:
(undocumented)

defaultStategy

public static Strategy defaultStategy(scala.Enumeration.Value algo)
Construct a default set of parameters for DecisionTree

Parameters:
algo - Algo.Classification or Algo.Regression
Returns:
(undocumented)

algo

public scala.Enumeration.Value algo()

impurity

public Impurity impurity()

setImpurity

public void setImpurity(Impurity x$1)

maxDepth

public int maxDepth()

setMaxDepth

public void setMaxDepth(int x$1)

numClasses

public int numClasses()

setNumClasses

public void setNumClasses(int x$1)

maxBins

public int maxBins()

setMaxBins

public void setMaxBins(int x$1)

quantileCalculationStrategy

public scala.Enumeration.Value quantileCalculationStrategy()

setQuantileCalculationStrategy

public void setQuantileCalculationStrategy(scala.Enumeration.Value x$1)

categoricalFeaturesInfo

public scala.collection.immutable.Map<Object,Object> categoricalFeaturesInfo()

minInstancesPerNode

public int minInstancesPerNode()

setMinInstancesPerNode

public void setMinInstancesPerNode(int x$1)

minInfoGain

public double minInfoGain()

setMinInfoGain

public void setMinInfoGain(double x$1)

maxMemoryInMB

public int maxMemoryInMB()

setMaxMemoryInMB

public void setMaxMemoryInMB(int x$1)

subsamplingRate

public double subsamplingRate()

setSubsamplingRate

public void setSubsamplingRate(double x$1)

useNodeIdCache

public boolean useNodeIdCache()

setUseNodeIdCache

public void setUseNodeIdCache(boolean x$1)

checkpointInterval

public int checkpointInterval()

setCheckpointInterval

public void setCheckpointInterval(int x$1)

isMulticlassClassification

public boolean isMulticlassClassification()

isMulticlassWithCategoricalFeatures

public boolean isMulticlassWithCategoricalFeatures()

setAlgo

public void setAlgo(String algo)
Sets Algorithm using a String.

Parameters:
algo - (undocumented)

setCategoricalFeaturesInfo

public void setCategoricalFeaturesInfo(java.util.Map<Integer,Integer> categoricalFeaturesInfo)
Sets categoricalFeaturesInfo using a Java Map.

Parameters:
categoricalFeaturesInfo - (undocumented)

copy

public Strategy copy()
Returns a shallow copy of this instance.


getAlgo

public scala.Enumeration.Value getAlgo()

getImpurity

public Impurity getImpurity()

getMaxDepth

public int getMaxDepth()

getNumClasses

public int getNumClasses()

getMaxBins

public int getMaxBins()

getQuantileCalculationStrategy

public scala.Enumeration.Value getQuantileCalculationStrategy()

getCategoricalFeaturesInfo

public scala.collection.immutable.Map<Object,Object> getCategoricalFeaturesInfo()

getMinInstancesPerNode

public int getMinInstancesPerNode()

getMinInfoGain

public double getMinInfoGain()

getMaxMemoryInMB

public int getMaxMemoryInMB()

getSubsamplingRate

public double getSubsamplingRate()

getUseNodeIdCache

public boolean getUseNodeIdCache()

getCheckpointInterval

public int getCheckpointInterval()