LinearDataGenerator

class pyspark.mllib.util.LinearDataGenerator[source]

Utils for generating linear data.

New in version 1.5.0.

Methods

generateLinearInput(intercept, weights, …)

New in version 1.5.0.

generateLinearRDD(sc, nexamples, nfeatures, eps)

Generate an RDD of LabeledPoints.

Methods Documentation

static generateLinearInput(intercept: float, weights: VectorLike, xMean: VectorLike, xVariance: VectorLike, nPoints: int, seed: int, eps: float) → List[LabeledPoint][source]

New in version 1.5.0.

Parameters
interceptfloat

bias factor, the term c in X’w + c

weightspyspark.mllib.linalg.Vector or convertible

feature vector, the term w in X’w + c

xMeanpyspark.mllib.linalg.Vector or convertible

Point around which the data X is centered.

xVariancepyspark.mllib.linalg.Vector or convertible

Variance of the given data

nPointsint

Number of points to be generated

seedint

Random Seed

epsfloat

Used to scale the noise. If eps is set high, the amount of gaussian noise added is more.

Returns
list

of pyspark.mllib.regression.LabeledPoints of length nPoints

static generateLinearRDD(sc: pyspark.context.SparkContext, nexamples: int, nfeatures: int, eps: float, nParts: int = 2, intercept: float = 0.0) → pyspark.rdd.RDD[LabeledPoint][source]

Generate an RDD of LabeledPoints.

New in version 1.5.0.