StreamingLogisticRegressionWithSGD

class pyspark.mllib.classification.StreamingLogisticRegressionWithSGD(stepSize: float = 0.1, numIterations: int = 50, miniBatchFraction: float = 1.0, regParam: float = 0.0, convergenceTol: float = 0.001)[source]

Train or predict a logistic regression model on streaming data. Training uses Stochastic Gradient Descent to update the model based on each new batch of incoming data from a DStream.

Each batch of data is assumed to be an RDD of LabeledPoints. The number of data points per batch can vary, but the number of features must be constant. An initial weight vector must be provided.

New in version 1.5.0.

Parameters
stepSizefloat, optional

Step size for each iteration of gradient descent. (default: 0.1)

numIterationsint, optional

Number of iterations run for each batch of data. (default: 50)

miniBatchFractionfloat, optional

Fraction of each batch of data to use for updates. (default: 1.0)

regParamfloat, optional

L2 Regularization parameter. (default: 0.0)

convergenceTolfloat, optional

Value used to determine when to terminate iterations. (default: 0.001)

Methods

latestModel()

Returns the latest model.

predictOn(dstream)

Use the model to make predictions on batches of data from a DStream.

predictOnValues(dstream)

Use the model to make predictions on the values of a DStream and carry over its keys.

setInitialWeights(initialWeights)

Set the initial value of weights.

trainOn(dstream)

Train the model on the incoming dstream.

Methods Documentation

latestModel() → Optional[pyspark.mllib.regression.LinearModel]

Returns the latest model.

New in version 1.5.0.

predictOn(dstream: DStream[VectorLike]) → DStream[float]

Use the model to make predictions on batches of data from a DStream.

New in version 1.5.0.

Returns
pyspark.streaming.DStream

DStream containing predictions.

predictOnValues(dstream: DStream[Tuple[K, VectorLike]]) → DStream[Tuple[K, float]]

Use the model to make predictions on the values of a DStream and carry over its keys.

New in version 1.5.0.

Returns
pyspark.streaming.DStream

DStream containing predictions.

setInitialWeights(initialWeights: VectorLike) → StreamingLogisticRegressionWithSGD[source]

Set the initial value of weights.

This must be set before running trainOn and predictOn.

New in version 1.5.0.

trainOn(dstream: pyspark.streaming.dstream.DStream[pyspark.mllib.regression.LabeledPoint]) → None[source]

Train the model on the incoming dstream.

New in version 1.5.0.