IsotonicRegressionModel¶
-
class
pyspark.mllib.regression.
IsotonicRegressionModel
(boundaries: numpy.ndarray, predictions: numpy.ndarray, isotonic: bool)[source]¶ Regression model for isotonic regression.
New in version 1.4.0.
- Parameters
- boundariesndarray
Array of boundaries for which predictions are known. Boundaries must be sorted in increasing order.
- predictionsndarray
Array of predictions associated to the boundaries at the same index. Results of isotonic regression and therefore monotone.
- isotonictrue
Indicates whether this is isotonic or antitonic.
Examples
>>> data = [(1, 0, 1), (2, 1, 1), (3, 2, 1), (1, 3, 1), (6, 4, 1), (17, 5, 1), (16, 6, 1)] >>> irm = IsotonicRegression.train(sc.parallelize(data)) >>> irm.predict(3) 2.0 >>> irm.predict(5) 16.5 >>> irm.predict(sc.parallelize([3, 5])).collect() [2.0, 16.5] >>> import os, tempfile >>> path = tempfile.mkdtemp() >>> irm.save(sc, path) >>> sameModel = IsotonicRegressionModel.load(sc, path) >>> sameModel.predict(3) 2.0 >>> sameModel.predict(5) 16.5 >>> from shutil import rmtree >>> try: ... rmtree(path) ... except OSError: ... pass
Methods
load
(sc, path)Load an IsotonicRegressionModel.
predict
(x)Predict labels for provided features.
save
(sc, path)Save an IsotonicRegressionModel.
Methods Documentation
-
classmethod
load
(sc: pyspark.context.SparkContext, path: str) → pyspark.mllib.regression.IsotonicRegressionModel[source]¶ Load an IsotonicRegressionModel.
New in version 1.4.0.
-
predict
(x: Union[float, VectorLike, pyspark.rdd.RDD[float], pyspark.rdd.RDD[VectorLike]]) → Union[numpy.float64, numpy.ndarray, pyspark.rdd.RDD[numpy.float64], pyspark.rdd.RDD[numpy.ndarray]][source]¶ Predict labels for provided features. Using a piecewise linear function. 1) If x exactly matches a boundary then associated prediction is returned. In case there are multiple predictions with the same boundary then one of them is returned. Which one is undefined (same as java.util.Arrays.binarySearch). 2) If x is lower or higher than all boundaries then first or last prediction is returned respectively. In case there are multiple predictions with the same boundary then the lowest or highest is returned respectively. 3) If x falls between two values in boundary array then prediction is treated as piecewise linear function and interpolated value is returned. In case there are multiple values with the same boundary then the same rules as in 2) are used.
New in version 1.4.0.
- Parameters
- x
pyspark.mllib.linalg.Vector
orpyspark.RDD
Feature or RDD of Features to be labeled.
- x