public final class RandomForestClassificationModel extends ProbabilisticClassificationModel<Vector,RandomForestClassificationModel> implements scala.Serializable
Random Forest
model for classification.
It supports both binary and multiclass labels, as well as both continuous and categorical
features.
param: _trees Decision trees in the ensemble.
Warning: These have null parents.Modifier and Type | Method and Description |
---|---|
RandomForestClassificationModel |
copy(ParamMap extra)
Creates a copy of this instance with the same UID and some extra params.
|
Vector |
featureImportances()
Estimate of the importance of each feature.
|
Param<String> |
featuresCol()
Param for features column name.
|
static RandomForestClassificationModel |
fromOld(RandomForestModel oldModel,
RandomForestClassifier parent,
scala.collection.immutable.Map<Object,Object> categoricalFeatures,
int numClasses,
int numFeatures)
(private[ml]) Convert a model from the old API
|
String |
getFeaturesCol() |
String |
getLabelCol() |
String |
getPredictionCol() |
String |
getRawPredictionCol() |
Param<String> |
labelCol()
Param for label column name.
|
int |
numClasses()
Number of classes (values which the label can take).
|
int |
numFeatures()
Returns the number of features the model was trained on.
|
Param<String> |
predictionCol()
Param for prediction column name.
|
Param<String> |
rawPredictionCol()
Param for raw prediction (a.k.a.
|
String |
toString() |
org.apache.spark.ml.tree.DecisionTreeModel[] |
trees() |
double[] |
treeWeights() |
String |
uid()
An immutable unique ID for the object and its derivatives.
|
StructType |
validateAndTransformSchema(StructType schema,
boolean fitting,
DataType featuresDataType) |
StructType |
validateAndTransformSchema(StructType schema,
boolean fitting,
DataType featuresDataType)
Validates and transforms the input schema with the provided param map.
|
normalizeToProbabilitiesInPlace, setProbabilityCol, setThresholds, transform
setRawPredictionCol
setFeaturesCol, setPredictionCol, transformSchema
transform, transform, transform
clear, copyValues, defaultCopy, defaultParamMap, explainParam, explainParams, extractParamMap, extractParamMap, get, getDefault, getOrDefault, getParam, hasDefault, hasParam, isDefined, isSet, paramMap, params, set, set, set, setDefault, setDefault, shouldOwn, validateParams
initializeIfNecessary, initializeLogging, isTraceEnabled, log_, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning
public static RandomForestClassificationModel fromOld(RandomForestModel oldModel, RandomForestClassifier parent, scala.collection.immutable.Map<Object,Object> categoricalFeatures, int numClasses, int numFeatures)
public String uid()
Identifiable
uid
in interface Identifiable
public int numFeatures()
PredictionModel
numFeatures
in class PredictionModel<Vector,RandomForestClassificationModel>
public int numClasses()
ClassificationModel
numClasses
in class ClassificationModel<Vector,RandomForestClassificationModel>
public org.apache.spark.ml.tree.DecisionTreeModel[] trees()
public double[] treeWeights()
public RandomForestClassificationModel copy(ParamMap extra)
Params
copy
in interface Params
copy
in class Model<RandomForestClassificationModel>
extra
- (undocumented)defaultCopy()
public String toString()
toString
in interface Identifiable
toString
in class Object
public Vector featureImportances()
This generalizes the idea of "Gini" importance to other losses, following the explanation of Gini importance from "Random Forests" documentation by Leo Breiman and Adele Cutler, and following the implementation from scikit-learn.
This feature importance is calculated as follows: - Average over trees: - importance(feature j) = sum (over nodes which split on feature j) of the gain, where gain is scaled by the number of instances passing through node - Normalize importances for tree based on total number of training instances used to build tree. - Normalize feature importance vector to sum to 1.
public StructType validateAndTransformSchema(StructType schema, boolean fitting, DataType featuresDataType)
public Param<String> rawPredictionCol()
public String getRawPredictionCol()
public StructType validateAndTransformSchema(StructType schema, boolean fitting, DataType featuresDataType)
schema
- input schemafitting
- whether this is in fittingfeaturesDataType
- SQL DataType for FeaturesType.
E.g., VectorUDT
for vector features.public Param<String> labelCol()
public String getLabelCol()
public Param<String> featuresCol()
public String getFeaturesCol()
public Param<String> predictionCol()
public String getPredictionCol()