LogisticRegressionModel¶

class pyspark.mllib.classification.LogisticRegressionModel(weights, intercept, numFeatures, numClasses)[source]

Classification model trained using Multinomial/Binary Logistic Regression.

New in version 0.9.0.

Parameters:
weightspyspark.mllib.linalg.Vector

Weights computed for every feature.

interceptfloat

Intercept computed for this model. (Only used in Binary Logistic Regression. In Multinomial Logistic Regression, the intercepts will not be a single value, so the intercepts will be part of the weights.)

numFeaturesint

The dimension of the features.

numClassesint

The number of possible outcomes for k classes classification problem in Multinomial Logistic Regression. By default, it is binary logistic regression so numClasses will be set to 2.

Examples

>>> from pyspark.mllib.linalg import SparseVector
>>> data = [
...     LabeledPoint(0.0, [0.0, 1.0]),
...     LabeledPoint(1.0, [1.0, 0.0]),
... ]
>>> lrm = LogisticRegressionWithSGD.train(sc.parallelize(data), iterations=10)
>>> lrm.predict([1.0, 0.0])
1
>>> lrm.predict([0.0, 1.0])
0
>>> lrm.predict(sc.parallelize([[1.0, 0.0], [0.0, 1.0]])).collect()
[1, 0]
>>> lrm.clearThreshold()
>>> lrm.predict([0.0, 1.0])
0.279...
>>> sparse_data = [
...     LabeledPoint(0.0, SparseVector(2, {0: 0.0})),
...     LabeledPoint(1.0, SparseVector(2, {1: 1.0})),
...     LabeledPoint(0.0, SparseVector(2, {0: 1.0})),
...     LabeledPoint(1.0, SparseVector(2, {1: 2.0}))
... ]
>>> lrm = LogisticRegressionWithSGD.train(sc.parallelize(sparse_data), iterations=10)
>>> lrm.predict(numpy.array([0.0, 1.0]))
1
>>> lrm.predict(numpy.array([1.0, 0.0]))
0
>>> lrm.predict(SparseVector(2, {1: 1.0}))
1
>>> lrm.predict(SparseVector(2, {0: 1.0}))
0
>>> import os, tempfile
>>> path = tempfile.mkdtemp()
>>> lrm.save(sc, path)
>>> sameModel.predict(numpy.array([0.0, 1.0]))
1
>>> sameModel.predict(SparseVector(2, {0: 1.0}))
0
>>> from shutil import rmtree
>>> try:
...    rmtree(path)
... except:
...    pass
>>> multi_class_data = [
...     LabeledPoint(0.0, [0.0, 1.0, 0.0]),
...     LabeledPoint(1.0, [1.0, 0.0, 0.0]),
...     LabeledPoint(2.0, [0.0, 0.0, 1.0])
... ]
>>> data = sc.parallelize(multi_class_data)
>>> mcm = LogisticRegressionWithLBFGS.train(data, iterations=10, numClasses=3)
>>> mcm.predict([0.0, 0.5, 0.0])
0
>>> mcm.predict([0.8, 0.0, 0.0])
1
>>> mcm.predict([0.0, 0.0, 0.3])
2

Methods

 Clears the threshold so that predict will output raw prediction scores. load(sc, path) Load a model from the given path. predict(x) Predict values for a single data point or an RDD of points using the model trained. save(sc, path) Save this model to the given path. setThreshold(value) Sets the threshold that separates positive predictions from negative predictions.

Attributes

 intercept Intercept computed for this model. numClasses Number of possible outcomes for k classes classification problem in Multinomial Logistic Regression. numFeatures Dimension of the features. threshold Returns the threshold (if any) used for converting raw prediction scores into 0/1 predictions. weights Weights computed for every feature.

Methods Documentation

clearThreshold()

Clears the threshold so that predict will output raw prediction scores. It is used for binary classification only.

New in version 1.4.0.

Load a model from the given path.

New in version 1.4.0.

predict(x)[source]

Predict values for a single data point or an RDD of points using the model trained.

New in version 0.9.0.

save(sc, path)[source]

Save this model to the given path.

New in version 1.4.0.

setThreshold(value)

Sets the threshold that separates positive predictions from negative predictions. An example with prediction score greater than or equal to this threshold is identified as a positive, and negative otherwise. It is used for binary classification only.

New in version 1.4.0.

Attributes Documentation

intercept

Intercept computed for this model.

New in version 1.0.0.

numClasses

Number of possible outcomes for k classes classification problem in Multinomial Logistic Regression.

New in version 1.4.0.

numFeatures

Dimension of the features.

New in version 1.4.0.

threshold

Returns the threshold (if any) used for converting raw prediction scores into 0/1 predictions. It is used for binary classification only.

New in version 1.4.0.

weights

Weights computed for every feature.

New in version 1.0.0.