LinearDataGenerator (Spark 1.4.1 JavaDoc)

Overview

Package

Class

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.spark.mllib.util
Class LinearDataGenerator

Object
  org.apache.spark.mllib.util.LinearDataGenerator

public class LinearDataGenerator
extends Object
extends Object

:: DeveloperApi :: Generate sample data used for Linear Data. This class generates uniformly random values for every feature and adds Gaussian noise with mean eps to the response variable Y.

Constructor Summary
`LinearDataGenerator()`

Method Summary
`static scala.collection.Seq<LabeledPoint>`	`generateLinearInput(double intercept, double[] weights, double[] xMean, double[] xVariance, int nPoints, int seed, double eps)`
`static scala.collection.Seq<LabeledPoint>`	`generateLinearInput(double intercept, double[] weights, int nPoints, int seed, double eps)` For compatibility, the generated data without specifying the mean and variance will have zero mean and variance of (1.0/3.0) since the original output range is [-1, 1] with uniform distribution, and the variance of uniform distribution is (b - a)^2^ / 12 which will be (1.0/3.0)
`static java.util.List<LabeledPoint>`	`generateLinearInputAsList(double intercept, double[] weights, int nPoints, int seed, double eps)` Return a Java List of synthetic data randomly generated according to a multi collinear model.
`static RDD<LabeledPoint>`	`generateLinearRDD(SparkContext sc, int nexamples, int nfeatures, double eps, int nparts, double intercept)` Generate an RDD containing sample data for Linear Regression models - including Ridge, Lasso, and uregularized variants.
`static void`	`main(String[] args)`

Methods inherited from class Object
`equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Constructor Detail

LinearDataGenerator

public LinearDataGenerator()

Method Detail

generateLinearInputAsList

public static java.util.List<LabeledPoint> generateLinearInputAsList(double intercept,
                                                                     double[] weights,
                                                                     int nPoints,
                                                                     int seed,
                                                                     double eps)

Return a Java List of synthetic data randomly generated according to a multi collinear model.

Parameters:: intercept - Data intercept; weights - Weights to be applied.; nPoints - Number of points in sample.; seed - Random seed; eps - (undocumented)
Returns:: Java List of input.

generateLinearInput

public static scala.collection.Seq<LabeledPoint> generateLinearInput(double intercept,
                                                                     double[] weights,
                                                                     int nPoints,
                                                                     int seed,
                                                                     double eps)

For compatibility, the generated data without specifying the mean and variance will have zero mean and variance of (1.0/3.0) since the original output range is [-1, 1] with uniform distribution, and the variance of uniform distribution is (b - a)^2^ / 12 which will be (1.0/3.0)

Parameters:: intercept - Data intercept; weights - Weights to be applied.; nPoints - Number of points in sample.; seed - Random seed; eps - Epsilon scaling factor.
Returns:: Seq of input.

generateLinearInput

public static scala.collection.Seq<LabeledPoint> generateLinearInput(double intercept,
                                                                     double[] weights,
                                                                     double[] xMean,
                                                                     double[] xVariance,
                                                                     int nPoints,
                                                                     int seed,
                                                                     double eps)

Parameters:: intercept - Data intercept; weights - Weights to be applied.; xMean - the mean of the generated features. Lots of time, if the features are not properly standardized, the algorithm with poor implementation will have difficulty to converge.; xVariance - the variance of the generated features.; nPoints - Number of points in sample.; seed - Random seed; eps - Epsilon scaling factor.
Returns:: Seq of input.

generateLinearRDD

public static RDD<LabeledPoint> generateLinearRDD(SparkContext sc,
                                                  int nexamples,
                                                  int nfeatures,
                                                  double eps,
                                                  int nparts,
                                                  double intercept)

Generate an RDD containing sample data for Linear Regression models - including Ridge, Lasso, and uregularized variants.

Parameters:: sc - SparkContext to be used for generating the RDD.; nexamples - Number of examples that will be contained in the RDD.; nfeatures - Number of features to generate for each example.; eps - Epsilon factor by which examples are scaled.; nparts - Number of partitions in the RDD. Default value is 2.; intercept - (undocumented)
Returns:: RDD of LabeledPoint containing sample data.

main

public static void main(String[] args)

Overview

Package

Class

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.spark.mllib.util Class LinearDataGenerator

LinearDataGenerator

generateLinearInputAsList

generateLinearInput

generateLinearInput

generateLinearRDD

main

org.apache.spark.mllib.util
Class LinearDataGenerator