class pyspark.mllib.classification.LogisticRegressionWithSGD[source]

Train a classification model for Binary Logistic Regression using Stochastic Gradient Descent.

New in version 0.9.0.

Deprecated since version 2.0.0: Use ml.classification.LogisticRegression or LogisticRegressionWithLBFGS.


train(data[, iterations, step, …])

Train a logistic regression model on the given data.

Methods Documentation

classmethod train(data: pyspark.rdd.RDD[pyspark.mllib.regression.LabeledPoint], iterations: int = 100, step: float = 1.0, miniBatchFraction: float = 1.0, initialWeights: Optional[VectorLike] = None, regParam: float = 0.01, regType: str = 'l2', intercept: bool = False, validateData: bool = True, convergenceTol: float = 0.001)pyspark.mllib.classification.LogisticRegressionModel[source]

Train a logistic regression model on the given data.

New in version 0.9.0.


The training data, an RDD of pyspark.mllib.regression.LabeledPoint.

iterationsint, optional

The number of iterations. (default: 100)

stepfloat, optional

The step parameter used in SGD. (default: 1.0)

miniBatchFractionfloat, optional

Fraction of data to be used for each SGD iteration. (default: 1.0)

initialWeightspyspark.mllib.linalg.Vector or convertible, optional

The initial weights. (default: None)

regParamfloat, optional

The regularizer parameter. (default: 0.01)

regTypestr, optional

The type of regularizer used for training our model. Supported values:

  • “l1” for using L1 regularization

  • “l2” for using L2 regularization (default)

  • None for no regularization

interceptbool, optional

Boolean parameter which indicates the use or not of the augmented representation for training data (i.e., whether bias features are activated or not). (default: False)

validateDatabool, optional

Boolean parameter which indicates if the algorithm should validate data before training. (default: True)

convergenceTolfloat, optional

A condition which decides iteration termination. (default: 0.001)