New in version 0.9.0.
train(rdd, k[, maxIterations, …])
Train a k-means clustering model.
Training points as an RDD of pyspark.mllib.linalg.Vector
or convertible sequence types.
Number of clusters to create.
Maximum number of iterations allowed.
The initialization algorithm. This can be either “random” or
Random seed value for cluster initialization. Set as None to
generate seed based on system time.
Number of steps for the k-means|| initialization mode.
This is an advanced setting – the default of 2 is almost
Distance threshold within which a center will be considered to
have converged. If all centers move less than this Euclidean
distance, iterations are stopped.
Initial cluster centers can be provided as a KMeansModel object
rather than using the random or k-means|| initializationModel.