Class StreamingLinearRegressionWithSGD
Object
org.apache.spark.mllib.regression.StreamingLinearAlgorithm<LinearRegressionModel,LinearRegressionWithSGD>
org.apache.spark.mllib.regression.StreamingLinearRegressionWithSGD
- All Implemented Interfaces:
Serializable
,org.apache.spark.internal.Logging
public class StreamingLinearRegressionWithSGD
extends StreamingLinearAlgorithm<LinearRegressionModel,LinearRegressionWithSGD>
implements Serializable
Train or predict a linear regression model on streaming data. Training uses
Stochastic Gradient Descent to update the model based on each new batch of
incoming data from a DStream (see
LinearRegressionWithSGD
for model equation)
Each batch of data is assumed to be an RDD of LabeledPoints. The number of data points per batch can vary, but the number of features must be constant. An initial weight vector must be provided.
Use a builder pattern to construct a streaming linear regression analysis in an application, like:
val model = new StreamingLinearRegressionWithSGD() .setStepSize(0.5) .setNumIterations(10) .setInitialWeights(Vectors.dense(...)) .trainOn(DStream)
- See Also:
-
Nested Class Summary
Nested classes/interfaces inherited from interface org.apache.spark.internal.Logging
org.apache.spark.internal.Logging.LogStringContext, org.apache.spark.internal.Logging.SparkShellLoggingFilter
-
Constructor Summary
ConstructorDescriptionConstruct a StreamingLinearRegression object with default parameters: {stepSize: 0.1, numIterations: 50, miniBatchFraction: 1.0}. -
Method Summary
Modifier and TypeMethodDescriptionsetConvergenceTol
(double tolerance) Set the convergence tolerance.setInitialWeights
(Vector initialWeights) Set the initial weights.setMiniBatchFraction
(double miniBatchFraction) Set the fraction of each batch to use for updates.setNumIterations
(int numIterations) Set the number of iterations of gradient descent to run per update.setRegParam
(double regParam) Set the regularization parameter.setStepSize
(double stepSize) Set the step size for gradient descent.Methods inherited from class org.apache.spark.mllib.regression.StreamingLinearAlgorithm
latestModel, predictOn, predictOn, predictOnValues, predictOnValues, trainOn, trainOn
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
Methods inherited from interface org.apache.spark.internal.Logging
initializeForcefully, initializeLogIfNecessary, initializeLogIfNecessary, initializeLogIfNecessary$default$2, isTraceEnabled, log, logDebug, logDebug, logDebug, logDebug, logError, logError, logError, logError, logInfo, logInfo, logInfo, logInfo, logName, LogStringContext, logTrace, logTrace, logTrace, logTrace, logWarning, logWarning, logWarning, logWarning, org$apache$spark$internal$Logging$$log_, org$apache$spark$internal$Logging$$log__$eq, withLogContext
-
Constructor Details
-
StreamingLinearRegressionWithSGD
public StreamingLinearRegressionWithSGD()Construct a StreamingLinearRegression object with default parameters: {stepSize: 0.1, numIterations: 50, miniBatchFraction: 1.0}. Initial weights must be set before using trainOn or predictOn (seeStreamingLinearAlgorithm
)
-
-
Method Details
-
algorithm
-
setConvergenceTol
Set the convergence tolerance. Default: 0.001.- Parameters:
tolerance
- (undocumented)- Returns:
- (undocumented)
-
setInitialWeights
Set the initial weights.- Parameters:
initialWeights
- (undocumented)- Returns:
- (undocumented)
-
setMiniBatchFraction
Set the fraction of each batch to use for updates. Default: 1.0.- Parameters:
miniBatchFraction
- (undocumented)- Returns:
- (undocumented)
-
setNumIterations
Set the number of iterations of gradient descent to run per update. Default: 50.- Parameters:
numIterations
- (undocumented)- Returns:
- (undocumented)
-
setRegParam
Set the regularization parameter. Default: 0.0.- Parameters:
regParam
- (undocumented)- Returns:
- (undocumented)
-
setStepSize
Set the step size for gradient descent. Default: 0.1.- Parameters:
stepSize
- (undocumented)- Returns:
- (undocumented)
-