Class LogisticRegressionDataGenerator

Object
org.apache.spark.mllib.util.LogisticRegressionDataGenerator

public class LogisticRegressionDataGenerator extends Object
Generate test data for LogisticRegression. This class chooses positive labels with probability probOne and scales features for positive examples by eps.
  • Constructor Details

    • LogisticRegressionDataGenerator

      public LogisticRegressionDataGenerator()
  • Method Details

    • generateLogisticRDD

      public static RDD<LabeledPoint> generateLogisticRDD(SparkContext sc, int nexamples, int nfeatures, double eps, int nparts, double probOne)
      Generate an RDD containing test data for LogisticRegression.

      Parameters:
      sc - SparkContext to use for creating the RDD.
      nexamples - Number of examples that will be contained in the RDD.
      nfeatures - Number of features to generate for each example.
      eps - Epsilon factor by which positive examples are scaled.
      nparts - Number of partitions of the generated RDD. Default value is 2.
      probOne - Probability that a label is 1 (and not 0). Default value is 0.5.
      Returns:
      (undocumented)
    • main

      public static void main(String[] args)