IsotonicRegressionModel

class pyspark.mllib.regression.IsotonicRegressionModel(boundaries, predictions, isotonic)[source]

Regression model for isotonic regression.

New in version 1.4.0.

Parameters
boundariesndarray

Array of boundaries for which predictions are known. Boundaries must be sorted in increasing order.

predictionsndarray

Array of predictions associated to the boundaries at the same index. Results of isotonic regression and therefore monotone.

isotonictrue

Indicates whether this is isotonic or antitonic.

Examples

>>> data = [(1, 0, 1), (2, 1, 1), (3, 2, 1), (1, 3, 1), (6, 4, 1), (17, 5, 1), (16, 6, 1)]
>>> irm = IsotonicRegression.train(sc.parallelize(data))
>>> irm.predict(3)
2.0
>>> irm.predict(5)
16.5
>>> irm.predict(sc.parallelize([3, 5])).collect()
[2.0, 16.5]
>>> import os, tempfile
>>> path = tempfile.mkdtemp()
>>> irm.save(sc, path)
>>> sameModel = IsotonicRegressionModel.load(sc, path)
>>> sameModel.predict(3)
2.0
>>> sameModel.predict(5)
16.5
>>> from shutil import rmtree
>>> try:
...     rmtree(path)
... except OSError:
...     pass

Methods

load(sc, path)

Load an IsotonicRegressionModel.

predict(x)

Predict labels for provided features.

save(sc, path)

Save an IsotonicRegressionModel.

Methods Documentation

classmethod load(sc, path)[source]

Load an IsotonicRegressionModel.

New in version 1.4.0.

predict(x)[source]

Predict labels for provided features. Using a piecewise linear function. 1) If x exactly matches a boundary then associated prediction is returned. In case there are multiple predictions with the same boundary then one of them is returned. Which one is undefined (same as java.util.Arrays.binarySearch). 2) If x is lower or higher than all boundaries then first or last prediction is returned respectively. In case there are multiple predictions with the same boundary then the lowest or highest is returned respectively. 3) If x falls between two values in boundary array then prediction is treated as piecewise linear function and interpolated value is returned. In case there are multiple values with the same boundary then the same rules as in 2) are used.

New in version 1.4.0.

Parameters
xpyspark.mllib.linalg.Vector or pyspark.RDD

Feature or RDD of Features to be labeled.

save(sc, path)[source]

Save an IsotonicRegressionModel.

New in version 1.4.0.