org.apache.spark.mllib.regression
Class LassoWithSGD

Object
  extended by org.apache.spark.mllib.regression.GeneralizedLinearAlgorithm<LassoModel>
      extended by org.apache.spark.mllib.regression.LassoWithSGD
All Implemented Interfaces:
java.io.Serializable, Logging

public class LassoWithSGD
extends GeneralizedLinearAlgorithm<LassoModel>
implements scala.Serializable

Train a regression model with L1-regularization using Stochastic Gradient Descent. This solves the l1-regularized least squares regression formulation f(weights) = 1/2n ||A weights-y||^2^ + regParam ||weights||_1 Here the data matrix has n rows, and the input RDD holds the set of rows of A, each with its corresponding right hand side label y. See also the documentation for the precise formulation.

See Also:
Serialized Form

Constructor Summary
LassoWithSGD()
          Construct a Lasso object with default parameters: {stepSize: 1.0, numIterations: 100, regParam: 0.01, miniBatchFraction: 1.0}.
 
Method Summary
 GradientDescent optimizer()
          The optimizer to solve the problem.
static LassoModel train(RDD<LabeledPoint> input, int numIterations)
          Train a Lasso model given an RDD of (label, features) pairs.
static LassoModel train(RDD<LabeledPoint> input, int numIterations, double stepSize, double regParam)
          Train a Lasso model given an RDD of (label, features) pairs.
static LassoModel train(RDD<LabeledPoint> input, int numIterations, double stepSize, double regParam, double miniBatchFraction)
          Train a Lasso model given an RDD of (label, features) pairs.
static LassoModel train(RDD<LabeledPoint> input, int numIterations, double stepSize, double regParam, double miniBatchFraction, Vector initialWeights)
          Train a Lasso model given an RDD of (label, features) pairs.
 
Methods inherited from class org.apache.spark.mllib.regression.GeneralizedLinearAlgorithm
getNumFeatures, isAddIntercept, run, run, setIntercept, setValidateData
 
Methods inherited from class Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface org.apache.spark.Logging
initializeIfNecessary, initializeLogging, isTraceEnabled, log_, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning
 

Constructor Detail

LassoWithSGD

public LassoWithSGD()
Construct a Lasso object with default parameters: {stepSize: 1.0, numIterations: 100, regParam: 0.01, miniBatchFraction: 1.0}.

Method Detail

train

public static LassoModel train(RDD<LabeledPoint> input,
                               int numIterations,
                               double stepSize,
                               double regParam,
                               double miniBatchFraction,
                               Vector initialWeights)
Train a Lasso model given an RDD of (label, features) pairs. We run a fixed number of iterations of gradient descent using the specified step size. Each iteration uses miniBatchFraction fraction of the data to calculate a stochastic gradient. The weights used in gradient descent are initialized using the initial weights provided.

Parameters:
input - RDD of (label, array of features) pairs. Each pair describes a row of the data matrix A as well as the corresponding right hand side label y
numIterations - Number of iterations of gradient descent to run.
stepSize - Step size scaling to be used for the iterations of gradient descent.
regParam - Regularization parameter.
miniBatchFraction - Fraction of data to be used per iteration.
initialWeights - Initial set of weights to be used. Array should be equal in size to the number of features in the data.
Returns:
(undocumented)

train

public static LassoModel train(RDD<LabeledPoint> input,
                               int numIterations,
                               double stepSize,
                               double regParam,
                               double miniBatchFraction)
Train a Lasso model given an RDD of (label, features) pairs. We run a fixed number of iterations of gradient descent using the specified step size. Each iteration uses miniBatchFraction fraction of the data to calculate a stochastic gradient.

Parameters:
input - RDD of (label, array of features) pairs. Each pair describes a row of the data matrix A as well as the corresponding right hand side label y
numIterations - Number of iterations of gradient descent to run.
stepSize - Step size to be used for each iteration of gradient descent.
regParam - Regularization parameter.
miniBatchFraction - Fraction of data to be used per iteration.
Returns:
(undocumented)

train

public static LassoModel train(RDD<LabeledPoint> input,
                               int numIterations,
                               double stepSize,
                               double regParam)
Train a Lasso model given an RDD of (label, features) pairs. We run a fixed number of iterations of gradient descent using the specified step size. We use the entire data set to update the true gradient in each iteration.

Parameters:
input - RDD of (label, array of features) pairs. Each pair describes a row of the data matrix A as well as the corresponding right hand side label y
stepSize - Step size to be used for each iteration of Gradient Descent.
regParam - Regularization parameter.
numIterations - Number of iterations of gradient descent to run.
Returns:
a LassoModel which has the weights and offset from training.

train

public static LassoModel train(RDD<LabeledPoint> input,
                               int numIterations)
Train a Lasso model given an RDD of (label, features) pairs. We run a fixed number of iterations of gradient descent using a step size of 1.0. We use the entire data set to compute the true gradient in each iteration.

Parameters:
input - RDD of (label, array of features) pairs. Each pair describes a row of the data matrix A as well as the corresponding right hand side label y
numIterations - Number of iterations of gradient descent to run.
Returns:
a LassoModel which has the weights and offset from training.

optimizer

public GradientDescent optimizer()
Description copied from class: GeneralizedLinearAlgorithm
The optimizer to solve the problem.

Specified by:
optimizer in class GeneralizedLinearAlgorithm<LassoModel>