Package org.apache.spark.ml.tree
Interface DecisionTreeParams
- All Superinterfaces:
HasCheckpointInterval
,HasFeaturesCol
,HasLabelCol
,HasPredictionCol
,HasSeed
,HasWeightCol
,Identifiable
,Params
,PredictorParams
,Serializable
- All Known Subinterfaces:
DecisionTreeClassifierParams
,DecisionTreeRegressorParams
,GBTClassifierParams
,GBTParams
,GBTRegressorParams
,RandomForestClassifierParams
,RandomForestParams
,RandomForestRegressorParams
,TreeEnsembleClassifierParams
,TreeEnsembleParams
,TreeEnsembleRegressorParams
- All Known Implementing Classes:
DecisionTreeClassificationModel
,DecisionTreeClassifier
,DecisionTreeRegressionModel
,DecisionTreeRegressor
,GBTClassificationModel
,GBTClassifier
,GBTRegressionModel
,GBTRegressor
,RandomForestClassificationModel
,RandomForestClassifier
,RandomForestRegressionModel
,RandomForestRegressor
public interface DecisionTreeParams
extends PredictorParams, HasCheckpointInterval, HasSeed, HasWeightCol
Parameters for Decision Tree-based algorithms.
Note: Marked as private since this may be made public in the future.
-
Method Summary
Modifier and TypeMethodDescriptionIf false, the algorithm will pass trees to executors to match instances with nodes.boolean
int
int
int
double
int
double
getOldStrategy
(scala.collection.immutable.Map<Object, Object> categoricalFeatures, int numClasses, scala.Enumeration.Value oldAlgo, Impurity oldImpurity, double subsamplingRate) (private[ml]) Create a Strategy instance to use with the old API.leafCol()
Leaf indices column name.maxBins()
Maximum number of bins used for discretizing continuous features and for choosing how to split on features at each node.maxDepth()
Maximum depth of the tree (nonnegative).Maximum memory in MB allocated to histogram aggregation.Minimum information gain for a split to be considered at a tree node.Minimum number of instances each child must have after split.Minimum fraction of the weighted sample count that each child must have after split.setLeafCol
(String value) Methods inherited from interface org.apache.spark.ml.param.shared.HasCheckpointInterval
checkpointInterval, getCheckpointInterval
Methods inherited from interface org.apache.spark.ml.param.shared.HasFeaturesCol
featuresCol, getFeaturesCol
Methods inherited from interface org.apache.spark.ml.param.shared.HasLabelCol
getLabelCol, labelCol
Methods inherited from interface org.apache.spark.ml.param.shared.HasPredictionCol
getPredictionCol, predictionCol
Methods inherited from interface org.apache.spark.ml.param.shared.HasWeightCol
getWeightCol, weightCol
Methods inherited from interface org.apache.spark.ml.util.Identifiable
toString, uid
Methods inherited from interface org.apache.spark.ml.param.Params
clear, copy, copyValues, defaultCopy, defaultParamMap, explainParam, explainParams, extractParamMap, extractParamMap, get, getDefault, getOrDefault, getParam, hasDefault, hasParam, isDefined, isSet, onParamChange, paramMap, params, set, set, set, setDefault, setDefault, shouldOwn
Methods inherited from interface org.apache.spark.ml.PredictorParams
validateAndTransformSchema
-
Method Details
-
cacheNodeIds
BooleanParam cacheNodeIds()If false, the algorithm will pass trees to executors to match instances with nodes. If true, the algorithm will cache node IDs for each instance. Caching can speed up training of deeper trees. Users can set how often should the cache be checkpointed or disable it by setting checkpointInterval. (default = false)- Returns:
- (undocumented)
-
getCacheNodeIds
boolean getCacheNodeIds() -
getLeafCol
String getLeafCol() -
getMaxBins
int getMaxBins() -
getMaxDepth
int getMaxDepth() -
getMaxMemoryInMB
int getMaxMemoryInMB() -
getMinInfoGain
double getMinInfoGain() -
getMinInstancesPerNode
int getMinInstancesPerNode() -
getMinWeightFractionPerNode
double getMinWeightFractionPerNode() -
getOldStrategy
Strategy getOldStrategy(scala.collection.immutable.Map<Object, Object> categoricalFeatures, int numClasses, scala.Enumeration.Value oldAlgo, Impurity oldImpurity, double subsamplingRate) (private[ml]) Create a Strategy instance to use with the old API. -
leafCol
Leaf indices column name. Predicted leaf index of each instance in each tree by preorder. (default = "")- Returns:
- (undocumented)
-
maxBins
IntParam maxBins()Maximum number of bins used for discretizing continuous features and for choosing how to split on features at each node. More bins give higher granularity. Must be at least 2 and at least number of categories in any categorical feature. (default = 32)- Returns:
- (undocumented)
-
maxDepth
IntParam maxDepth()Maximum depth of the tree (nonnegative). E.g., depth 0 means 1 leaf node; depth 1 means 1 internal node + 2 leaf nodes. (default = 5)- Returns:
- (undocumented)
-
maxMemoryInMB
IntParam maxMemoryInMB()Maximum memory in MB allocated to histogram aggregation. If too small, then 1 node will be split per iteration, and its aggregates may exceed this size. (default = 256 MB)- Returns:
- (undocumented)
-
minInfoGain
DoubleParam minInfoGain()Minimum information gain for a split to be considered at a tree node. Should be at least 0.0. (default = 0.0)- Returns:
- (undocumented)
-
minInstancesPerNode
IntParam minInstancesPerNode()Minimum number of instances each child must have after split. If a split causes the left or right child to have fewer than minInstancesPerNode, the split will be discarded as invalid. Must be at least 1. (default = 1)- Returns:
- (undocumented)
-
minWeightFractionPerNode
DoubleParam minWeightFractionPerNode()Minimum fraction of the weighted sample count that each child must have after split. If a split causes the fraction of the total weight in the left or right child to be less than minWeightFractionPerNode, the split will be discarded as invalid. Should be in the interval [0.0, 0.5). (default = 0.0)- Returns:
- (undocumented)
-
setLeafCol
-