org.apache.spark.mllib.optimization.GradientDescent

All Implemented Interfaces:: Serializable, org.apache.spark.internal.Logging, Optimizer, scala.Serializable

public class GradientDescent extends Object implements Optimizer, org.apache.spark.internal.Logging

Class used to solve an optimization problem using Gradient Descent. param: gradient Gradient function to be used. param: updater Updater to be used to update weights after every iteration.

See Also:

Serialized Form

Nested Class Summary

Nested classes/interfaces inherited from interface org.apache.spark.internal.Logging
org.apache.spark.internal.Logging.SparkShellLoggingFilter
Method Summary

Modifier and Type

Method

Description

Vector

optimize(RDD<scala.Tuple2<Object,Vector>> data, Vector initialWeights)

Runs gradient descent on the given training data.

scala.Tuple2<Vector,double[]>

optimizeWithLossReturned(RDD<scala.Tuple2<Object,Vector>> data, Vector initialWeights)

Runs gradient descent on the given training data.

static org.slf4j.Logger

org$apache$spark$internal$Logging$$log_()

static void

org$apache$spark$internal$Logging$$log__$eq(org.slf4j.Logger x$1)

static scala.Tuple2<Vector,double[]>

runMiniBatchSGD(RDD<scala.Tuple2<Object,Vector>> data, Gradient gradient, Updater updater, double stepSize, int numIterations, double regParam, double miniBatchFraction, Vector initialWeights)

Alias of runMiniBatchSGD with convergenceTol set to default value of 0.001.

static scala.Tuple2<Vector,double[]>

runMiniBatchSGD(RDD<scala.Tuple2<Object,Vector>> data, Gradient gradient, Updater updater, double stepSize, int numIterations, double regParam, double miniBatchFraction, Vector initialWeights, double convergenceTol)

Run stochastic gradient descent (SGD) in parallel using mini batches.

GradientDescent

setConvergenceTol(double tolerance)

Set the convergence tolerance.

GradientDescent

setGradient(Gradient gradient)

Set the gradient function (of the loss function of one single data example) to be used for SGD.

GradientDescent

setMiniBatchFraction(double fraction)

Set fraction of data to be used for each SGD iteration.

GradientDescent

setNumIterations(int iters)

Set the number of iterations for SGD.

GradientDescent

setRegParam(double regParam)

Set the regularization parameter.

GradientDescent

setStepSize(double step)

Set the initial step size of SGD for the first step.

GradientDescent

setUpdater(Updater updater)

Set the updater function to actually perform a gradient step in a given direction.

Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface org.apache.spark.internal.Logging
initializeForcefully, initializeLogIfNecessary, initializeLogIfNecessary, initializeLogIfNecessary$default$2, isTraceEnabled, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning, org$apache$spark$internal$Logging$$log_, org$apache$spark$internal$Logging$$log__$eq

Method Details
- runMiniBatchSGD
  
  public static scala.Tuple2<Vector,double[]> runMiniBatchSGD(RDD<scala.Tuple2<Object,Vector>> data, Gradient gradient, Updater updater, double stepSize, int numIterations, double regParam, double miniBatchFraction, Vector initialWeights, double convergenceTol)
  
  Run stochastic gradient descent (SGD) in parallel using mini batches. In each iteration, we sample a subset (fraction miniBatchFraction) of the total data in order to compute a gradient estimate. Sampling, and averaging the subgradients over this subset is performed using one standard spark map-reduce in each iteration.
  
  Parameters:
  
  data - Input data for SGD. RDD of the set of data examples, each of the form (label, [feature values]).
  
  gradient - Gradient object (used to compute the gradient of the loss function of one single data example)
  
  updater - Updater function to actually perform a gradient step in a given direction.
  
  stepSize - initial step size for the first step
  
  numIterations - number of iterations that SGD should be run.
  
  regParam - regularization parameter
  
  miniBatchFraction - fraction of the input data set that should be used for one iteration of SGD. Default value 1.0.
  
  convergenceTol - Minibatch iteration will end before numIterations if the relative difference between the current weight and the previous weight is less than this value. In measuring convergence, L2 norm is calculated. Default value 0.001. Must be between 0.0 and 1.0 inclusively.
  
  initialWeights - (undocumented)
  
  Returns:
  
  A tuple containing two elements. The first element is a column matrix containing weights for every feature, and the second element is an array containing the stochastic loss computed for every iteration.
- runMiniBatchSGD
  
  public static scala.Tuple2<Vector,double[]> runMiniBatchSGD(RDD<scala.Tuple2<Object,Vector>> data, Gradient gradient, Updater updater, double stepSize, int numIterations, double regParam, double miniBatchFraction, Vector initialWeights)
  
  Alias of runMiniBatchSGD with convergenceTol set to default value of 0.001.
  
  Parameters:
  
  data - (undocumented)
  
  gradient - (undocumented)
  
  updater - (undocumented)
  
  stepSize - (undocumented)
  
  numIterations - (undocumented)
  
  regParam - (undocumented)
  
  miniBatchFraction - (undocumented)
  
  initialWeights - (undocumented)
  
  Returns:
  
  (undocumented)
- org$apache$spark$internal$Logging$$log_
  
  public static org.slf4j.Logger org$apache$spark$internal$Logging$$log_()
- org$apache$spark$internal$Logging$$log__$eq
  
  public static void org$apache$spark$internal$Logging$$log__$eq(org.slf4j.Logger x$1)
- setStepSize
  
  public GradientDescent setStepSize(double step)
  
  Set the initial step size of SGD for the first step. Default 1.0. In subsequent steps, the step size will decrease with stepSize/sqrt(t)
  
  Parameters:
  
  step - (undocumented)
  
  Returns:
  
  (undocumented)
- setMiniBatchFraction
  
  public GradientDescent setMiniBatchFraction(double fraction)
  
  Set fraction of data to be used for each SGD iteration. Default 1.0 (corresponding to deterministic/classical gradient descent)
  
  Parameters:
  
  fraction - (undocumented)
  
  Returns:
  
  (undocumented)
- setNumIterations
  
  public GradientDescent setNumIterations(int iters)
  
  Set the number of iterations for SGD. Default 100.
  
  Parameters:
  
  iters - (undocumented)
  
  Returns:
  
  (undocumented)
- setRegParam
  
  public GradientDescent setRegParam(double regParam)
  
  Set the regularization parameter. Default 0.0.
  
  Parameters:
  
  regParam - (undocumented)
  
  Returns:
  
  (undocumented)
- setConvergenceTol
  
  public GradientDescent setConvergenceTol(double tolerance)
  
  Set the convergence tolerance. Default 0.001 convergenceTol is a condition which decides iteration termination. The end of iteration is decided based on below logic.
  - If the norm of the new solution vector is greater than 1, the diff of solution vectors is compared to relative tolerance which means normalizing by the norm of the new solution vector. - If the norm of the new solution vector is less than or equal to 1, the diff of solution vectors is compared to absolute tolerance which is not normalizing.
  Must be between 0.0 and 1.0 inclusively.
  
  Parameters:
  
  tolerance - (undocumented)
  
  Returns:
  
  (undocumented)
- setGradient
  
  public GradientDescent setGradient(Gradient gradient)
  
  Set the gradient function (of the loss function of one single data example) to be used for SGD.
  
  Parameters:
  
  gradient - (undocumented)
  
  Returns:
  
  (undocumented)
- setUpdater
  
  public GradientDescent setUpdater(Updater updater)
  
  Set the updater function to actually perform a gradient step in a given direction. The updater is responsible to perform the update from the regularization term as well, and therefore determines what kind or regularization is used, if any.
  
  Parameters:
  
  updater - (undocumented)
  
  Returns:
  
  (undocumented)
- optimize
  
  public Vector optimize(RDD<scala.Tuple2<Object,Vector>> data, Vector initialWeights)
  
  Runs gradient descent on the given training data.
  
  Specified by:
  
  optimize in interface Optimizer
  
  Parameters:
  
  data - training data
  
  initialWeights - initial weights
  
  Returns:
  
  solution vector
- optimizeWithLossReturned
  
  public scala.Tuple2<Vector,double[]> optimizeWithLossReturned(RDD<scala.Tuple2<Object,Vector>> data, Vector initialWeights)
  
  Runs gradient descent on the given training data.
  
  Parameters:
  
  data - training data
  
  initialWeights - initial weights
  
  Returns:
  
  solution vector and loss value in an array

Class GradientDescent

Nested Class Summary

Nested classes/interfaces inherited from interface org.apache.spark.internal.Logging

Method Summary

Methods inherited from class java.lang.Object

Methods inherited from interface org.apache.spark.internal.Logging

Method Details

runMiniBatchSGD

runMiniBatchSGD

org$apache$spark$internal$Logging$$log_

org$apache$spark$internal$Logging$$log__$eq

setStepSize

setMiniBatchFraction

setNumIterations

setRegParam

setConvergenceTol

setGradient

setUpdater

optimize

optimizeWithLossReturned