Package org.apache.spark.mllib.util
Class KMeansDataGenerator
Object
org.apache.spark.mllib.util.KMeansDataGenerator
Generate test data for KMeans. This class first chooses k cluster centers
from a d-dimensional Gaussian distribution scaled by factor r and then creates a Gaussian
cluster with scale 1 around each center.
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionstatic RDD<double[]>
generateKMeansRDD
(SparkContext sc, int numPoints, int k, int d, double r, int numPartitions) Generate an RDD containing test data for KMeans.static void
-
Constructor Details
-
KMeansDataGenerator
public KMeansDataGenerator()
-
-
Method Details
-
generateKMeansRDD
public static RDD<double[]> generateKMeansRDD(SparkContext sc, int numPoints, int k, int d, double r, int numPartitions) Generate an RDD containing test data for KMeans.- Parameters:
sc
- SparkContext to use for creating the RDDnumPoints
- Number of points that will be contained in the RDDk
- Number of clustersd
- Number of dimensionsr
- Scaling factor for the distribution of the initial centersnumPartitions
- Number of partitions of the generated RDD; default 2- Returns:
- (undocumented)
-
main
-