org.apache.spark.mllib.util
Class KMeansDataGenerator

Object
  extended by org.apache.spark.mllib.util.KMeansDataGenerator

public class KMeansDataGenerator
extends Object

:: DeveloperApi :: Generate test data for KMeans. This class first chooses k cluster centers from a d-dimensional Gaussian distribution scaled by factor r and then creates a Gaussian cluster with scale 1 around each center.


Constructor Summary
KMeansDataGenerator()
           
 
Method Summary
static RDD<double[]> generateKMeansRDD(SparkContext sc, int numPoints, int k, int d, double r, int numPartitions)
          Generate an RDD containing test data for KMeans.
static void main(String[] args)
           
 
Methods inherited from class Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

KMeansDataGenerator

public KMeansDataGenerator()
Method Detail

generateKMeansRDD

public static RDD<double[]> generateKMeansRDD(SparkContext sc,
                                              int numPoints,
                                              int k,
                                              int d,
                                              double r,
                                              int numPartitions)
Generate an RDD containing test data for KMeans.

Parameters:
sc - SparkContext to use for creating the RDD
numPoints - Number of points that will be contained in the RDD
k - Number of clusters
d - Number of dimensions
r - Scaling factor for the distribution of the initial centers
numPartitions - Number of partitions of the generated RDD; default 2
Returns:
(undocumented)

main

public static void main(String[] args)