RidgeRegressionWithSGD¶

class pyspark.mllib.regression.RidgeRegressionWithSGD[source]¶

Train a regression model with L2-regularization using Stochastic Gradient Descent.

New in version 0.9.0.

Deprecated since version 2.0.0: Use pyspark.ml.regression.LinearRegression with elasticNetParam = 0.0. Note the default regParam is 0.01 for RidgeRegressionWithSGD, but is 0.0 for LinearRegression.

Methods

train(data[, iterations, step, regParam, …])

Train a regression model with L2-regularization using Stochastic Gradient Descent.

Methods Documentation

classmethod train(data, iterations=100, step=1.0, regParam=0.01, miniBatchFraction=1.0, initialWeights=None, intercept=False, validateData=True, convergenceTol=0.001)[source]¶

Train a regression model with L2-regularization using Stochastic Gradient Descent. This solves the l2-regularized least squares regression formulation

f(weights) = 1/(2n) ||A weights - y||^2 + regParam/2 ||weights||^2

Here the data matrix has n rows, and the input RDD holds the set of rows of A, each with its corresponding right hand side label y. See also the documentation for the precise formulation.

New in version 0.9.0.

Parameters

datapyspark.RDD: The training data, an RDD of LabeledPoint.
iterationsint, optional: The number of iterations. (default: 100)
stepfloat, optional: The step parameter used in SGD. (default: 1.0)
regParamfloat, optional: The regularizer parameter. (default: 0.01)
miniBatchFractionfloat, optional: Fraction of data to be used for each SGD iteration. (default: 1.0)
initialWeightspyspark.mllib.linalg.Vector or convertible, optional: The initial weights. (default: None)
interceptbool, optional: Boolean parameter which indicates the use or not of the augmented representation for training data (i.e. whether bias features are activated or not). (default: False)
validateDatabool, optional: Boolean parameter which indicates if the algorithm should validate data before training. (default: True)
convergenceTolfloat, optional: A condition which decides iteration termination. (default: 0.001)

RidgeRegressionModel LassoModel