Class StreamingLinearRegressionWithSGD

Object
org.apache.spark.mllib.regression.StreamingLinearAlgorithm<LinearRegressionModel,LinearRegressionWithSGD>
org.apache.spark.mllib.regression.StreamingLinearRegressionWithSGD
All Implemented Interfaces:
Serializable, org.apache.spark.internal.Logging

public class StreamingLinearRegressionWithSGD extends StreamingLinearAlgorithm<LinearRegressionModel,LinearRegressionWithSGD> implements Serializable
Train or predict a linear regression model on streaming data. Training uses Stochastic Gradient Descent to update the model based on each new batch of incoming data from a DStream (see LinearRegressionWithSGD for model equation)

Each batch of data is assumed to be an RDD of LabeledPoints. The number of data points per batch can vary, but the number of features must be constant. An initial weight vector must be provided.

Use a builder pattern to construct a streaming linear regression analysis in an application, like:

val model = new StreamingLinearRegressionWithSGD() .setStepSize(0.5) .setNumIterations(10) .setInitialWeights(Vectors.dense(...)) .trainOn(DStream)

See Also:
  • Constructor Details

    • StreamingLinearRegressionWithSGD

      public StreamingLinearRegressionWithSGD()
      Construct a StreamingLinearRegression object with default parameters: {stepSize: 0.1, numIterations: 50, miniBatchFraction: 1.0}. Initial weights must be set before using trainOn or predictOn (see StreamingLinearAlgorithm)
  • Method Details

    • algorithm

      public LinearRegressionWithSGD algorithm()
    • setConvergenceTol

      public StreamingLinearRegressionWithSGD setConvergenceTol(double tolerance)
      Set the convergence tolerance. Default: 0.001.
      Parameters:
      tolerance - (undocumented)
      Returns:
      (undocumented)
    • setInitialWeights

      public StreamingLinearRegressionWithSGD setInitialWeights(Vector initialWeights)
      Set the initial weights.
      Parameters:
      initialWeights - (undocumented)
      Returns:
      (undocumented)
    • setMiniBatchFraction

      public StreamingLinearRegressionWithSGD setMiniBatchFraction(double miniBatchFraction)
      Set the fraction of each batch to use for updates. Default: 1.0.
      Parameters:
      miniBatchFraction - (undocumented)
      Returns:
      (undocumented)
    • setNumIterations

      public StreamingLinearRegressionWithSGD setNumIterations(int numIterations)
      Set the number of iterations of gradient descent to run per update. Default: 50.
      Parameters:
      numIterations - (undocumented)
      Returns:
      (undocumented)
    • setRegParam

      public StreamingLinearRegressionWithSGD setRegParam(double regParam)
      Set the regularization parameter. Default: 0.0.
      Parameters:
      regParam - (undocumented)
      Returns:
      (undocumented)
    • setStepSize

      public StreamingLinearRegressionWithSGD setStepSize(double stepSize)
      Set the step size for gradient descent. Default: 0.1.
      Parameters:
      stepSize - (undocumented)
      Returns:
      (undocumented)