Class KMeansDataGenerator

Object
org.apache.spark.mllib.util.KMeansDataGenerator

public class KMeansDataGenerator extends Object
Generate test data for KMeans. This class first chooses k cluster centers from a d-dimensional Gaussian distribution scaled by factor r and then creates a Gaussian cluster with scale 1 around each center.
  • Constructor Details

    • KMeansDataGenerator

      public KMeansDataGenerator()
  • Method Details

    • generateKMeansRDD

      public static RDD<double[]> generateKMeansRDD(SparkContext sc, int numPoints, int k, int d, double r, int numPartitions)
      Generate an RDD containing test data for KMeans.

      Parameters:
      sc - SparkContext to use for creating the RDD
      numPoints - Number of points that will be contained in the RDD
      k - Number of clusters
      d - Number of dimensions
      r - Scaling factor for the distribution of the initial centers
      numPartitions - Number of partitions of the generated RDD; default 2
      Returns:
      (undocumented)
    • main

      public static void main(String[] args)