org.apache.spark.mllib.regression
Class StreamingLinearRegressionWithSGD

Object
  extended by org.apache.spark.mllib.regression.StreamingLinearAlgorithm<LinearRegressionModel,LinearRegressionWithSGD>
      extended by org.apache.spark.mllib.regression.StreamingLinearRegressionWithSGD
All Implemented Interfaces:
java.io.Serializable, Logging

public class StreamingLinearRegressionWithSGD
extends StreamingLinearAlgorithm<LinearRegressionModel,LinearRegressionWithSGD>
implements scala.Serializable

:: Experimental :: Train or predict a linear regression model on streaming data. Training uses Stochastic Gradient Descent to update the model based on each new batch of incoming data from a DStream (see LinearRegressionWithSGD for model equation)

Each batch of data is assumed to be an RDD of LabeledPoints. The number of data points per batch can vary, but the number of features must be constant. An initial weight vector must be provided.

Use a builder pattern to construct a streaming linear regression analysis in an application, like:

val model = new StreamingLinearRegressionWithSGD() .setStepSize(0.5) .setNumIterations(10) .setInitialWeights(Vectors.dense(...)) .trainOn(DStream)

See Also:
Serialized Form

Constructor Summary
StreamingLinearRegressionWithSGD()
          Construct a StreamingLinearRegression object with default parameters: {stepSize: 0.1, numIterations: 50, miniBatchFraction: 1.0}.
 
Method Summary
 LinearRegressionWithSGD algorithm()
          The algorithm to use for updating.
 StreamingLinearRegressionWithSGD setInitialWeights(Vector initialWeights)
          Set the initial weights.
 StreamingLinearRegressionWithSGD setMiniBatchFraction(double miniBatchFraction)
          Set the fraction of each batch to use for updates.
 StreamingLinearRegressionWithSGD setNumIterations(int numIterations)
          Set the number of iterations of gradient descent to run per update.
 StreamingLinearRegressionWithSGD setStepSize(double stepSize)
          Set the step size for gradient descent.
 
Methods inherited from class org.apache.spark.mllib.regression.StreamingLinearAlgorithm
latestModel, predictOn, predictOn, predictOnValues, predictOnValues, trainOn, trainOn
 
Methods inherited from class Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface org.apache.spark.Logging
initializeIfNecessary, initializeLogging, isTraceEnabled, log_, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning
 

Constructor Detail

StreamingLinearRegressionWithSGD

public StreamingLinearRegressionWithSGD()
Construct a StreamingLinearRegression object with default parameters: {stepSize: 0.1, numIterations: 50, miniBatchFraction: 1.0}. Initial weights must be set before using trainOn or predictOn (see StreamingLinearAlgorithm)

Method Detail

algorithm

public LinearRegressionWithSGD algorithm()
Description copied from class: StreamingLinearAlgorithm
The algorithm to use for updating.


setStepSize

public StreamingLinearRegressionWithSGD setStepSize(double stepSize)
Set the step size for gradient descent. Default: 0.1.


setNumIterations

public StreamingLinearRegressionWithSGD setNumIterations(int numIterations)
Set the number of iterations of gradient descent to run per update. Default: 50.


setMiniBatchFraction

public StreamingLinearRegressionWithSGD setMiniBatchFraction(double miniBatchFraction)
Set the fraction of each batch to use for updates. Default: 1.0.


setInitialWeights

public StreamingLinearRegressionWithSGD setInitialWeights(Vector initialWeights)
Set the initial weights. Default: [0.0, 0.0].