org.apache.spark.mllib.util
Class LogisticRegressionDataGenerator

Object
  extended by org.apache.spark.mllib.util.LogisticRegressionDataGenerator

public class LogisticRegressionDataGenerator
extends Object

:: DeveloperApi :: Generate test data for LogisticRegression. This class chooses positive labels with probability probOne and scales features for positive examples by eps.


Constructor Summary
LogisticRegressionDataGenerator()
           
 
Method Summary
static RDD<LabeledPoint> generateLogisticRDD(SparkContext sc, int nexamples, int nfeatures, double eps, int nparts, double probOne)
          Generate an RDD containing test data for LogisticRegression.
static void main(String[] args)
           
 
Methods inherited from class Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

LogisticRegressionDataGenerator

public LogisticRegressionDataGenerator()
Method Detail

generateLogisticRDD

public static RDD<LabeledPoint> generateLogisticRDD(SparkContext sc,
                                                    int nexamples,
                                                    int nfeatures,
                                                    double eps,
                                                    int nparts,
                                                    double probOne)
Generate an RDD containing test data for LogisticRegression.

Parameters:
sc - SparkContext to use for creating the RDD.
nexamples - Number of examples that will be contained in the RDD.
nfeatures - Number of features to generate for each example.
eps - Epsilon factor by which positive examples are scaled.
nparts - Number of partitions of the generated RDD. Default value is 2.
probOne - Probability that a label is 1 (and not 0). Default value is 0.5.
Returns:
(undocumented)

main

public static void main(String[] args)