Object

org.apache.spark.ml.tree.impl.GradientBoostedTrees

public class GradientBoostedTrees extends Object

Constructor Summary

Constructors

Constructor

Description

GradientBoostedTrees()
Method Summary

Modifier and Type

Method

Description

static scala.Tuple2<DecisionTreeRegressionModel[],double[]>

boost(RDD<org.apache.spark.ml.feature.Instance> input, RDD<org.apache.spark.ml.feature.Instance> validationInput, BoostingStrategy boostingStrategy, boolean validate, long seed, String featureSubsetStrategy, scala.Option<org.apache.spark.ml.util.Instrumentation> instr)

Internal method for performing regression using trees as base learners.

static RDD<scala.Tuple2<Object,Object>>

computeInitialPredictionAndError(RDD<org.apache.spark.ml.tree.impl.TreePoint> data, double initTreeWeight, DecisionTreeRegressionModel initTree, Loss loss, Broadcast<Split[][]> bcSplits)

Compute the initial predictions and errors for a dataset for the first iteration of gradient boosting.

static double

computeWeightedError(RDD<org.apache.spark.ml.feature.Instance> data, DecisionTreeRegressionModel[] trees, double[] treeWeights, Loss loss)

Method to calculate error of the base learner for the gradient boosting calculation.

static double

computeWeightedError(RDD<org.apache.spark.ml.tree.impl.TreePoint> data, RDD<scala.Tuple2<Object,Object>> predError)

Method to calculate error of the base learner for the gradient boosting calculation.

static double[]

evaluateEachIteration(RDD<org.apache.spark.ml.feature.Instance> data, DecisionTreeRegressionModel[] trees, double[] treeWeights, Loss loss, scala.Enumeration.Value algo)

Method to compute error or loss for every iteration of gradient boosting.

static org.apache.spark.internal.Logging.LogStringContext

LogStringContext(scala.StringContext sc)

static org.slf4j.Logger

org$apache$spark$internal$Logging$$log_()

static void

org$apache$spark$internal$Logging$$log__$eq(org.slf4j.Logger x$1)

static scala.Tuple2<DecisionTreeRegressionModel[],double[]>

run(RDD<org.apache.spark.ml.feature.Instance> input, BoostingStrategy boostingStrategy, long seed, String featureSubsetStrategy, scala.Option<org.apache.spark.ml.util.Instrumentation> instr)

Method to train a gradient boosting model

static scala.Tuple2<DecisionTreeRegressionModel[],double[]>

runWithValidation(RDD<org.apache.spark.ml.feature.Instance> input, RDD<org.apache.spark.ml.feature.Instance> validationInput, BoostingStrategy boostingStrategy, long seed, String featureSubsetStrategy, scala.Option<org.apache.spark.ml.util.Instrumentation> instr)

Method to validate a gradient boosting model

static double

updatePrediction(Vector features, double prediction, DecisionTreeRegressionModel tree, double weight)

Add prediction from a new boosting iteration to an existing prediction.

static double

updatePrediction(org.apache.spark.ml.tree.impl.TreePoint treePoint, double prediction, DecisionTreeRegressionModel tree, double weight, Split[][] splits)

Add prediction from a new boosting iteration to an existing prediction.

static RDD<scala.Tuple2<Object,Object>>

updatePredictionError(RDD<org.apache.spark.ml.tree.impl.TreePoint> data, RDD<scala.Tuple2<Object,Object>> predictionAndError, double treeWeight, DecisionTreeRegressionModel tree, Loss loss, Broadcast<Split[][]> bcSplits)

Update a zipped predictionError RDD (as obtained with computeInitialPredictionAndError)

Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Details
- GradientBoostedTrees
  
  public GradientBoostedTrees()
Method Details
- run
  
  public static scala.Tuple2<DecisionTreeRegressionModel[],double[]> run(RDD<org.apache.spark.ml.feature.Instance> input, BoostingStrategy boostingStrategy, long seed, String featureSubsetStrategy, scala.Option<org.apache.spark.ml.util.Instrumentation> instr)
  
  Method to train a gradient boosting model
  
  Parameters:
  
  input - Training dataset: RDD of Instance.
  
  seed - Random seed.
  
  boostingStrategy - (undocumented)
  
  featureSubsetStrategy - (undocumented)
  
  instr - (undocumented)
  
  Returns:
  
  tuple of ensemble models and weights: (array of decision tree models, array of model weights)
- runWithValidation
  
  public static scala.Tuple2<DecisionTreeRegressionModel[],double[]> runWithValidation(RDD<org.apache.spark.ml.feature.Instance> input, RDD<org.apache.spark.ml.feature.Instance> validationInput, BoostingStrategy boostingStrategy, long seed, String featureSubsetStrategy, scala.Option<org.apache.spark.ml.util.Instrumentation> instr)
  
  Method to validate a gradient boosting model
  
  Parameters:
  
  input - Training dataset: RDD of Instance.
  
  validationInput - Validation dataset. This dataset should be different from the training dataset, but it should follow the same distribution. E.g., these two datasets could be created from an original dataset by using org.apache.spark.rdd.RDD.randomSplit()
  
  seed - Random seed.
  
  boostingStrategy - (undocumented)
  
  featureSubsetStrategy - (undocumented)
  
  instr - (undocumented)
  
  Returns:
  
  tuple of ensemble models and weights: (array of decision tree models, array of model weights)
- computeInitialPredictionAndError
  
  public static RDD<scala.Tuple2<Object,Object>> computeInitialPredictionAndError(RDD<org.apache.spark.ml.tree.impl.TreePoint> data, double initTreeWeight, DecisionTreeRegressionModel initTree, Loss loss, Broadcast<Split[][]> bcSplits)
  
  Compute the initial predictions and errors for a dataset for the first iteration of gradient boosting.
  
  Parameters:
  
  data - : training data.
  
  initTreeWeight - : learning rate assigned to the first tree.
  
  initTree - : first DecisionTreeModel.
  
  loss - : evaluation metric.
  
  bcSplits - (undocumented)
  
  Returns:
  
  an RDD with each element being a zip of the prediction and error corresponding to every sample.
- updatePredictionError
  
  public static RDD<scala.Tuple2<Object,Object>> updatePredictionError(RDD<org.apache.spark.ml.tree.impl.TreePoint> data, RDD<scala.Tuple2<Object,Object>> predictionAndError, double treeWeight, DecisionTreeRegressionModel tree, Loss loss, Broadcast<Split[][]> bcSplits)
  
  Update a zipped predictionError RDD (as obtained with computeInitialPredictionAndError)
  
  Parameters:
  
  data - : training data.
  
  predictionAndError - : predictionError RDD
  
  treeWeight - : Learning rate.
  
  tree - : Tree using which the prediction and error should be updated.
  
  loss - : evaluation metric.
  
  bcSplits - (undocumented)
  
  Returns:
  
  an RDD with each element being a zip of the prediction and error corresponding to each sample.
- updatePrediction
  
  public static double updatePrediction(org.apache.spark.ml.tree.impl.TreePoint treePoint, double prediction, DecisionTreeRegressionModel tree, double weight, Split[][] splits)
  
  Add prediction from a new boosting iteration to an existing prediction.
  
  Parameters:
  
  treePoint - Binned vector of features representing a single data point.
  
  prediction - The existing prediction.
  
  tree - New Decision Tree model.
  
  weight - Tree weight.
  
  splits - (undocumented)
  
  Returns:
  
  Updated prediction.
- updatePrediction
  
  public static double updatePrediction(Vector features, double prediction, DecisionTreeRegressionModel tree, double weight)
  
  Add prediction from a new boosting iteration to an existing prediction.
  
  Parameters:
  
  features - Vector of features representing a single data point.
  
  prediction - The existing prediction.
  
  tree - New Decision Tree model.
  
  weight - Tree weight.
  
  Returns:
  
  Updated prediction.
- computeWeightedError
  
  public static double computeWeightedError(RDD<org.apache.spark.ml.feature.Instance> data, DecisionTreeRegressionModel[] trees, double[] treeWeights, Loss loss)
  
  Method to calculate error of the base learner for the gradient boosting calculation. Note: This method is not used by the gradient boosting algorithm but is useful for debugging purposes.
  
  Parameters:
  
  data - Training dataset: RDD of Instance.
  
  trees - Boosted Decision Tree models
  
  treeWeights - Learning rates at each boosting iteration.
  
  loss - evaluation metric.
  
  Returns:
  
  Measure of model error on data
- computeWeightedError
  
  public static double computeWeightedError(RDD<org.apache.spark.ml.tree.impl.TreePoint> data, RDD<scala.Tuple2<Object,Object>> predError)
  
  Method to calculate error of the base learner for the gradient boosting calculation.
  
  Parameters:
  
  data - Training dataset: RDD of TreePoint.
  
  predError - Prediction and error.
  
  Returns:
  
  Measure of model error on data
- evaluateEachIteration
  
  public static double[] evaluateEachIteration(RDD<org.apache.spark.ml.feature.Instance> data, DecisionTreeRegressionModel[] trees, double[] treeWeights, Loss loss, scala.Enumeration.Value algo)
  
  Method to compute error or loss for every iteration of gradient boosting.
  
  Parameters:
  
  data - RDD of Instance
  
  trees - Boosted Decision Tree models
  
  treeWeights - Learning rates at each boosting iteration.
  
  loss - evaluation metric.
  
  algo - algorithm for the ensemble, either Classification or Regression
  
  Returns:
  
  an array with index i having the losses or errors for the ensemble containing the first i+1 trees
- boost
  
  public static scala.Tuple2<DecisionTreeRegressionModel[],double[]> boost(RDD<org.apache.spark.ml.feature.Instance> input, RDD<org.apache.spark.ml.feature.Instance> validationInput, BoostingStrategy boostingStrategy, boolean validate, long seed, String featureSubsetStrategy, scala.Option<org.apache.spark.ml.util.Instrumentation> instr)
  
  Internal method for performing regression using trees as base learners.
  
  Parameters:
  
  input - training dataset
  
  validationInput - validation dataset, ignored if validate is set to false.
  
  boostingStrategy - boosting parameters
  
  validate - whether or not to use the validation dataset.
  
  seed - Random seed.
  
  featureSubsetStrategy - (undocumented)
  
  instr - (undocumented)
  
  Returns:
  
  tuple of ensemble models and weights: (array of decision tree models, array of model weights)
- org$apache$spark$internal$Logging$$log_
  
  public static org.slf4j.Logger org$apache$spark$internal$Logging$$log_()
- org$apache$spark$internal$Logging$$log__$eq
  
  public static void org$apache$spark$internal$Logging$$log__$eq(org.slf4j.Logger x$1)
- LogStringContext
  
  public static org.apache.spark.internal.Logging.LogStringContext LogStringContext(scala.StringContext sc)

Class GradientBoostedTrees

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Details

GradientBoostedTrees

Method Details

run

runWithValidation

computeInitialPredictionAndError

updatePredictionError

updatePrediction

updatePrediction

computeWeightedError

computeWeightedError

evaluateEachIteration

boost

org$apache$spark$internal$Logging$$log_

org$apache$spark$internal$Logging$$log__$eq

LogStringContext