object RandomForest extends Serializable with Logging
- Annotations
- @Since( "1.2.0" )
- Source
- RandomForest.scala
- Alphabetic
- By Inheritance
- RandomForest
- Logging
- Serializable
- Serializable
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Value Members
-
val
supportedFeatureSubsetStrategies: Array[String]
List of supported feature subset sampling strategies.
List of supported feature subset sampling strategies.
- Annotations
- @Since( "1.2.0" )
-
def
trainClassifier(input: JavaRDD[LabeledPoint], numClasses: Int, categoricalFeaturesInfo: Map[Integer, Integer], numTrees: Int, featureSubsetStrategy: String, impurity: String, maxDepth: Int, maxBins: Int, seed: Int): RandomForestModel
Java-friendly API for
org.apache.spark.mllib.tree.RandomForest.trainClassifier
Java-friendly API for
org.apache.spark.mllib.tree.RandomForest.trainClassifier
- Annotations
- @Since( "1.2.0" )
-
def
trainClassifier(input: RDD[LabeledPoint], numClasses: Int, categoricalFeaturesInfo: Map[Int, Int], numTrees: Int, featureSubsetStrategy: String, impurity: String, maxDepth: Int, maxBins: Int, seed: Int = Utils.random.nextInt()): RandomForestModel
Method to train a decision tree model for binary or multiclass classification.
Method to train a decision tree model for binary or multiclass classification.
- input
Training dataset: RDD of org.apache.spark.mllib.regression.LabeledPoint. Labels should take values {0, 1, ..., numClasses-1}.
- numClasses
Number of classes for classification.
- categoricalFeaturesInfo
Map storing arity of categorical features. An entry (n to k) indicates that feature n is categorical with k categories indexed from 0: {0, 1, ..., k-1}.
- numTrees
Number of trees in the random forest.
- featureSubsetStrategy
Number of features to consider for splits at each node. Supported values: "auto", "all", "sqrt", "log2", "onethird". If "auto" is set, this parameter is set based on numTrees: if numTrees == 1, set to "all"; if numTrees is greater than 1 (forest) set to "sqrt".
- impurity
Criterion used for information gain calculation. Supported values: "gini" (recommended) or "entropy".
- maxDepth
Maximum depth of the tree (e.g. depth 0 means 1 leaf node, depth 1 means 1 internal node + 2 leaf nodes). (suggested value: 4)
- maxBins
Maximum number of bins used for splitting features (suggested value: 100)
- seed
Random seed for bootstrapping and choosing feature subsets.
- returns
RandomForestModel that can be used for prediction.
- Annotations
- @Since( "1.2.0" )
-
def
trainClassifier(input: RDD[LabeledPoint], strategy: Strategy, numTrees: Int, featureSubsetStrategy: String, seed: Int): RandomForestModel
Method to train a decision tree model for binary or multiclass classification.
Method to train a decision tree model for binary or multiclass classification.
- input
Training dataset: RDD of org.apache.spark.mllib.regression.LabeledPoint. Labels should take values {0, 1, ..., numClasses-1}.
- strategy
Parameters for training each tree in the forest.
- numTrees
Number of trees in the random forest.
- featureSubsetStrategy
Number of features to consider for splits at each node. Supported values: "auto", "all", "sqrt", "log2", "onethird". If "auto" is set, this parameter is set based on numTrees: if numTrees == 1, set to "all"; if numTrees is greater than 1 (forest) set to "sqrt".
- seed
Random seed for bootstrapping and choosing feature subsets.
- returns
RandomForestModel that can be used for prediction.
- Annotations
- @Since( "1.2.0" )
-
def
trainRegressor(input: JavaRDD[LabeledPoint], categoricalFeaturesInfo: Map[Integer, Integer], numTrees: Int, featureSubsetStrategy: String, impurity: String, maxDepth: Int, maxBins: Int, seed: Int): RandomForestModel
Java-friendly API for
org.apache.spark.mllib.tree.RandomForest.trainRegressor
Java-friendly API for
org.apache.spark.mllib.tree.RandomForest.trainRegressor
- Annotations
- @Since( "1.2.0" )
-
def
trainRegressor(input: RDD[LabeledPoint], categoricalFeaturesInfo: Map[Int, Int], numTrees: Int, featureSubsetStrategy: String, impurity: String, maxDepth: Int, maxBins: Int, seed: Int = Utils.random.nextInt()): RandomForestModel
Method to train a decision tree model for regression.
Method to train a decision tree model for regression.
- input
Training dataset: RDD of org.apache.spark.mllib.regression.LabeledPoint. Labels are real numbers.
- categoricalFeaturesInfo
Map storing arity of categorical features. An entry (n to k) indicates that feature n is categorical with k categories indexed from 0: {0, 1, ..., k-1}.
- numTrees
Number of trees in the random forest.
- featureSubsetStrategy
Number of features to consider for splits at each node. Supported values: "auto", "all", "sqrt", "log2", "onethird". If "auto" is set, this parameter is set based on numTrees: if numTrees == 1, set to "all"; if numTrees is greater than 1 (forest) set to "onethird".
- impurity
Criterion used for information gain calculation. The only supported value for regression is "variance".
- maxDepth
Maximum depth of the tree. (e.g., depth 0 means 1 leaf node, depth 1 means 1 internal node + 2 leaf nodes). (suggested value: 4)
- maxBins
Maximum number of bins used for splitting features. (suggested value: 100)
- seed
Random seed for bootstrapping and choosing feature subsets.
- returns
RandomForestModel that can be used for prediction.
- Annotations
- @Since( "1.2.0" )
-
def
trainRegressor(input: RDD[LabeledPoint], strategy: Strategy, numTrees: Int, featureSubsetStrategy: String, seed: Int): RandomForestModel
Method to train a decision tree model for regression.
Method to train a decision tree model for regression.
- input
Training dataset: RDD of org.apache.spark.mllib.regression.LabeledPoint. Labels are real numbers.
- strategy
Parameters for training each tree in the forest.
- numTrees
Number of trees in the random forest.
- featureSubsetStrategy
Number of features to consider for splits at each node. Supported values: "auto", "all", "sqrt", "log2", "onethird". If "auto" is set, this parameter is set based on numTrees: if numTrees == 1, set to "all"; if numTrees is greater than 1 (forest) set to "onethird".
- seed
Random seed for bootstrapping and choosing feature subsets.
- returns
RandomForestModel that can be used for prediction.
- Annotations
- @Since( "1.2.0" )