object DecisionTree extends Serializable with Logging
- Annotations
- @Since( "1.0.0" )
- Source
- DecisionTree.scala
- Alphabetic
- By Inheritance
- DecisionTree
- Logging
- Serializable
- Serializable
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Value Members
-
def
train(input: RDD[LabeledPoint], algo: Algo, impurity: Impurity, maxDepth: Int, numClasses: Int, maxBins: Int, quantileCalculationStrategy: QuantileStrategy, categoricalFeaturesInfo: Map[Int, Int]): DecisionTreeModel
Method to train a decision tree model.
Method to train a decision tree model. The method supports binary and multiclass classification and regression.
- input
Training dataset: RDD of org.apache.spark.mllib.regression.LabeledPoint. For classification, labels should take values {0, 1, ..., numClasses-1}. For regression, labels are real numbers.
- algo
Type of decision tree, either classification or regression.
- impurity
Criterion used for information gain calculation.
- maxDepth
Maximum depth of the tree (e.g. depth 0 means 1 leaf node, depth 1 means 1 internal node + 2 leaf nodes).
- numClasses
Number of classes for classification. Default value of 2.
- maxBins
Maximum number of bins used for splitting features.
- quantileCalculationStrategy
Algorithm for calculating quantiles.
- categoricalFeaturesInfo
Map storing arity of categorical features. An entry (n to k) indicates that feature n is categorical with k categories indexed from 0: {0, 1, ..., k-1}.
- returns
DecisionTreeModel that can be used for prediction.
- Annotations
- @Since( "1.0.0" )
- Note
Using
org.apache.spark.mllib.tree.DecisionTree.trainClassifier
andorg.apache.spark.mllib.tree.DecisionTree.trainRegressor
is recommended to clearly separate classification and regression.
-
def
train(input: RDD[LabeledPoint], algo: Algo, impurity: Impurity, maxDepth: Int, numClasses: Int): DecisionTreeModel
Method to train a decision tree model.
Method to train a decision tree model. The method supports binary and multiclass classification and regression.
- input
Training dataset: RDD of org.apache.spark.mllib.regression.LabeledPoint. For classification, labels should take values {0, 1, ..., numClasses-1}. For regression, labels are real numbers.
- algo
Type of decision tree, either classification or regression.
- impurity
Criterion used for information gain calculation.
- maxDepth
Maximum depth of the tree (e.g. depth 0 means 1 leaf node, depth 1 means 1 internal node + 2 leaf nodes).
- numClasses
Number of classes for classification. Default value of 2.
- returns
DecisionTreeModel that can be used for prediction.
- Annotations
- @Since( "1.2.0" )
- Note
Using
org.apache.spark.mllib.tree.DecisionTree.trainClassifier
andorg.apache.spark.mllib.tree.DecisionTree.trainRegressor
is recommended to clearly separate classification and regression.
-
def
train(input: RDD[LabeledPoint], algo: Algo, impurity: Impurity, maxDepth: Int): DecisionTreeModel
Method to train a decision tree model.
Method to train a decision tree model. The method supports binary and multiclass classification and regression.
- input
Training dataset: RDD of org.apache.spark.mllib.regression.LabeledPoint. For classification, labels should take values {0, 1, ..., numClasses-1}. For regression, labels are real numbers.
- algo
Type of decision tree, either classification or regression.
- impurity
Criterion used for information gain calculation.
- maxDepth
Maximum depth of the tree (e.g. depth 0 means 1 leaf node, depth 1 means 1 internal node + 2 leaf nodes).
- returns
DecisionTreeModel that can be used for prediction.
- Annotations
- @Since( "1.0.0" )
- Note
Using
org.apache.spark.mllib.tree.DecisionTree.trainClassifier
andorg.apache.spark.mllib.tree.DecisionTree.trainRegressor
is recommended to clearly separate classification and regression.
-
def
train(input: RDD[LabeledPoint], strategy: Strategy): DecisionTreeModel
Method to train a decision tree model.
Method to train a decision tree model. The method supports binary and multiclass classification and regression.
- input
Training dataset: RDD of org.apache.spark.mllib.regression.LabeledPoint. For classification, labels should take values {0, 1, ..., numClasses-1}. For regression, labels are real numbers.
- strategy
The configuration parameters for the tree algorithm which specify the type of decision tree (classification or regression), feature type (continuous, categorical), depth of the tree, quantile calculation strategy, etc.
- returns
DecisionTreeModel that can be used for prediction.
- Annotations
- @Since( "1.0.0" )
- Note
Using
org.apache.spark.mllib.tree.DecisionTree.trainClassifier
andorg.apache.spark.mllib.tree.DecisionTree.trainRegressor
is recommended to clearly separate classification and regression.
-
def
trainClassifier(input: JavaRDD[LabeledPoint], numClasses: Int, categoricalFeaturesInfo: Map[Integer, Integer], impurity: String, maxDepth: Int, maxBins: Int): DecisionTreeModel
Java-friendly API for
org.apache.spark.mllib.tree.DecisionTree.trainClassifier
Java-friendly API for
org.apache.spark.mllib.tree.DecisionTree.trainClassifier
- Annotations
- @Since( "1.1.0" )
-
def
trainClassifier(input: RDD[LabeledPoint], numClasses: Int, categoricalFeaturesInfo: Map[Int, Int], impurity: String, maxDepth: Int, maxBins: Int): DecisionTreeModel
Method to train a decision tree model for binary or multiclass classification.
Method to train a decision tree model for binary or multiclass classification.
- input
Training dataset: RDD of org.apache.spark.mllib.regression.LabeledPoint. Labels should take values {0, 1, ..., numClasses-1}.
- numClasses
Number of classes for classification.
- categoricalFeaturesInfo
Map storing arity of categorical features. An entry (n to k) indicates that feature n is categorical with k categories indexed from 0: {0, 1, ..., k-1}.
- impurity
Criterion used for information gain calculation. Supported values: "gini" (recommended) or "entropy".
- maxDepth
Maximum depth of the tree (e.g. depth 0 means 1 leaf node, depth 1 means 1 internal node + 2 leaf nodes). (suggested value: 5)
- maxBins
Maximum number of bins used for splitting features. (suggested value: 32)
- returns
DecisionTreeModel that can be used for prediction.
- Annotations
- @Since( "1.1.0" )
-
def
trainRegressor(input: JavaRDD[LabeledPoint], categoricalFeaturesInfo: Map[Integer, Integer], impurity: String, maxDepth: Int, maxBins: Int): DecisionTreeModel
Java-friendly API for
org.apache.spark.mllib.tree.DecisionTree.trainRegressor
Java-friendly API for
org.apache.spark.mllib.tree.DecisionTree.trainRegressor
- Annotations
- @Since( "1.1.0" )
-
def
trainRegressor(input: RDD[LabeledPoint], categoricalFeaturesInfo: Map[Int, Int], impurity: String, maxDepth: Int, maxBins: Int): DecisionTreeModel
Method to train a decision tree model for regression.
Method to train a decision tree model for regression.
- input
Training dataset: RDD of org.apache.spark.mllib.regression.LabeledPoint. Labels are real numbers.
- categoricalFeaturesInfo
Map storing arity of categorical features. An entry (n to k) indicates that feature n is categorical with k categories indexed from 0: {0, 1, ..., k-1}.
- impurity
Criterion used for information gain calculation. The only supported value for regression is "variance".
- maxDepth
Maximum depth of the tree (e.g. depth 0 means 1 leaf node, depth 1 means 1 internal node + 2 leaf nodes). (suggested value: 5)
- maxBins
Maximum number of bins used for splitting features. (suggested value: 32)
- returns
DecisionTreeModel that can be used for prediction.
- Annotations
- @Since( "1.1.0" )