All Superinterfaces:: HasCheckpointInterval, HasFeaturesCol, HasLabelCol, HasPredictionCol, HasSeed, HasWeightCol, Identifiable, Params, PredictorParams, Serializable, scala.Serializable

All Known Subinterfaces:: DecisionTreeClassifierParams, DecisionTreeRegressorParams, GBTClassifierParams, GBTParams, GBTRegressorParams, RandomForestClassifierParams, RandomForestParams, RandomForestRegressorParams, TreeEnsembleClassifierParams, TreeEnsembleParams, TreeEnsembleRegressorParams

All Known Implementing Classes:: DecisionTreeClassificationModel, DecisionTreeClassifier, DecisionTreeRegressionModel, DecisionTreeRegressor, GBTClassificationModel, GBTClassifier, GBTRegressionModel, GBTRegressor, RandomForestClassificationModel, RandomForestClassifier, RandomForestRegressionModel, RandomForestRegressor

public interface DecisionTreeParams extends PredictorParams, HasCheckpointInterval, HasSeed, HasWeightCol

Parameters for Decision Tree-based algorithms.

Note: Marked as private since this may be made public in the future.

Method Summary

Modifier and Type

Method

Description

BooleanParam

cacheNodeIds()

If false, the algorithm will pass trees to executors to match instances with nodes.

boolean

getCacheNodeIds()

String

getLeafCol()

int

getMaxBins()

int

getMaxDepth()

int

getMaxMemoryInMB()

double

getMinInfoGain()

int

getMinInstancesPerNode()

double

getMinWeightFractionPerNode()

Strategy

getOldStrategy(scala.collection.immutable.Map<Object,Object> categoricalFeatures, int numClasses, scala.Enumeration.Value oldAlgo, Impurity oldImpurity, double subsamplingRate)

(private[ml]) Create a Strategy instance to use with the old API.

Param<String>

leafCol()

Leaf indices column name.

IntParam

maxBins()

Maximum number of bins used for discretizing continuous features and for choosing how to split on features at each node.

IntParam

maxDepth()

Maximum depth of the tree (nonnegative).

IntParam

maxMemoryInMB()

Maximum memory in MB allocated to histogram aggregation.

DoubleParam

minInfoGain()

Minimum information gain for a split to be considered at a tree node.

IntParam

minInstancesPerNode()

Minimum number of instances each child must have after split.

DoubleParam

minWeightFractionPerNode()

Minimum fraction of the weighted sample count that each child must have after split.

DecisionTreeParams

setLeafCol(String value)

Methods inherited from interface org.apache.spark.ml.param.shared.HasCheckpointInterval
checkpointInterval, getCheckpointInterval

Methods inherited from interface org.apache.spark.ml.param.shared.HasFeaturesCol
featuresCol, getFeaturesCol

Methods inherited from interface org.apache.spark.ml.param.shared.HasLabelCol
getLabelCol, labelCol

Methods inherited from interface org.apache.spark.ml.param.shared.HasPredictionCol
getPredictionCol, predictionCol

Methods inherited from interface org.apache.spark.ml.param.shared.HasSeed
getSeed, seed

Methods inherited from interface org.apache.spark.ml.param.shared.HasWeightCol
getWeightCol, weightCol

Methods inherited from interface org.apache.spark.ml.util.Identifiable
toString, uid

Methods inherited from interface org.apache.spark.ml.param.Params
clear, copy, copyValues, defaultCopy, defaultParamMap, explainParam, explainParams, extractParamMap, extractParamMap, get, getDefault, getOrDefault, getParam, hasDefault, hasParam, isDefined, isSet, onParamChange, paramMap, params, set, set, set, setDefault, setDefault, shouldOwn

Methods inherited from interface org.apache.spark.ml.PredictorParams
validateAndTransformSchema

Method Details
- cacheNodeIds
  
  BooleanParam cacheNodeIds()
  
  If false, the algorithm will pass trees to executors to match instances with nodes. If true, the algorithm will cache node IDs for each instance. Caching can speed up training of deeper trees. Users can set how often should the cache be checkpointed or disable it by setting checkpointInterval. (default = false)
  
  Returns:
  
  (undocumented)
- getCacheNodeIds
  
  boolean getCacheNodeIds()
- getLeafCol
  
  String getLeafCol()
- getMaxBins
  
  int getMaxBins()
- getMaxDepth
  
  int getMaxDepth()
- getMaxMemoryInMB
  
  int getMaxMemoryInMB()
- getMinInfoGain
  
  double getMinInfoGain()
- getMinInstancesPerNode
  
  int getMinInstancesPerNode()
- getMinWeightFractionPerNode
  
  double getMinWeightFractionPerNode()
- getOldStrategy
  
  Strategy getOldStrategy(scala.collection.immutable.Map<Object,Object> categoricalFeatures, int numClasses, scala.Enumeration.Value oldAlgo, Impurity oldImpurity, double subsamplingRate)
  
  (private[ml]) Create a Strategy instance to use with the old API.
- leafCol
  
  Param<String> leafCol()
  
  Leaf indices column name. Predicted leaf index of each instance in each tree by preorder. (default = "")
  
  Returns:
  
  (undocumented)
- maxBins
  
  IntParam maxBins()
  
  Maximum number of bins used for discretizing continuous features and for choosing how to split on features at each node. More bins give higher granularity. Must be at least 2 and at least number of categories in any categorical feature. (default = 32)
  
  Returns:
  
  (undocumented)
- maxDepth
  
  IntParam maxDepth()
  
  Maximum depth of the tree (nonnegative). E.g., depth 0 means 1 leaf node; depth 1 means 1 internal node + 2 leaf nodes. (default = 5)
  
  Returns:
  
  (undocumented)
- maxMemoryInMB
  
  IntParam maxMemoryInMB()
  
  Maximum memory in MB allocated to histogram aggregation. If too small, then 1 node will be split per iteration, and its aggregates may exceed this size. (default = 256 MB)
  
  Returns:
  
  (undocumented)
- minInfoGain
  
  DoubleParam minInfoGain()
  
  Minimum information gain for a split to be considered at a tree node. Should be at least 0.0. (default = 0.0)
  
  Returns:
  
  (undocumented)
- minInstancesPerNode
  
  IntParam minInstancesPerNode()
  
  Minimum number of instances each child must have after split. If a split causes the left or right child to have fewer than minInstancesPerNode, the split will be discarded as invalid. Must be at least 1. (default = 1)
  
  Returns:
  
  (undocumented)
- minWeightFractionPerNode
  
  DoubleParam minWeightFractionPerNode()
  
  Minimum fraction of the weighted sample count that each child must have after split. If a split causes the fraction of the total weight in the left or right child to be less than minWeightFractionPerNode, the split will be discarded as invalid. Should be in the interval [0.0, 0.5). (default = 0.0)
  
  Returns:
  
  (undocumented)
- setLeafCol
  
  DecisionTreeParams setLeafCol(String value)

Interface DecisionTreeParams

Method Summary

Methods inherited from interface org.apache.spark.ml.param.shared.HasCheckpointInterval

Methods inherited from interface org.apache.spark.ml.param.shared.HasFeaturesCol

Methods inherited from interface org.apache.spark.ml.param.shared.HasLabelCol

Methods inherited from interface org.apache.spark.ml.param.shared.HasPredictionCol

Methods inherited from interface org.apache.spark.ml.param.shared.HasSeed

Methods inherited from interface org.apache.spark.ml.param.shared.HasWeightCol

Methods inherited from interface org.apache.spark.ml.util.Identifiable

Methods inherited from interface org.apache.spark.ml.param.Params

Methods inherited from interface org.apache.spark.ml.PredictorParams

Method Details

cacheNodeIds

getCacheNodeIds

getLeafCol

getMaxBins

getMaxDepth

getMaxMemoryInMB

getMinInfoGain

getMinInstancesPerNode

getMinWeightFractionPerNode

getOldStrategy

leafCol

maxBins

maxDepth

maxMemoryInMB

minInfoGain

minInstancesPerNode

minWeightFractionPerNode

setLeafCol