public class KMeans
implements scala.Serializable, Logging
K-means clustering with support for multiple parallel runs and a k-means++ like initialization
mode (the k-means|| algorithm by Bahmani et al). When multiple concurrent runs are requested,
they are executed together with joint passes over the data for efficiency.
This is an iterative algorithm that will make multiple passes over the data, so any RDDs given
to it should be cached by the user.
Set maximum number of iterations to run. Default: 20.
public KMeans setInitializationMode(String initializationMode)
Set the initialization algorithm. This can be either "random" to choose random points as
initial cluster centers, or "k-means||" to use a parallel variant of k-means++
(Bahmani et al., Scalable K-Means++, VLDB 2012). Default: k-means||.
:: Experimental ::
Set the number of runs of the algorithm to execute in parallel. We initialize the algorithm
this many times with random starting conditions (configured by the initialization mode), then
return the best clustering found over any run. Default: 1.
public KMeans setInitializationSteps(int initializationSteps)
Set the number of steps for the k-means|| initialization mode. This is an advanced
setting -- the default of 5 is almost always enough. Default: 5.