Class KMeansDataGenerator


public class KMeansDataGenerator extends Object
Generate test data for KMeans. This class first chooses k cluster centers from a d-dimensional Gaussian distribution scaled by factor r and then creates a Gaussian cluster with scale 1 around each center.
  • Constructor Details

    • KMeansDataGenerator

      public KMeansDataGenerator()
  • Method Details

    • generateKMeansRDD

      public static RDD<double[]> generateKMeansRDD(SparkContext sc, int numPoints, int k, int d, double r, int numPartitions)
      Generate an RDD containing test data for KMeans.

      sc - SparkContext to use for creating the RDD
      numPoints - Number of points that will be contained in the RDD
      k - Number of clusters
      d - Number of dimensions
      r - Scaling factor for the distribution of the initial centers
      numPartitions - Number of partitions of the generated RDD; default 2
    • main

      public static void main(String[] args)