class pyspark.mllib.clustering.GaussianMixture[source]

Learning algorithm for Gaussian Mixtures using the expectation-maximization algorithm.

New in version 1.3.0.


train(rdd, k[, convergenceTol, …])

Train a Gaussian Mixture clustering model.

Methods Documentation

classmethod train(rdd: pyspark.rdd.RDD[VectorLike], k: int, convergenceTol: float = 0.001, maxIterations: int = 100, seed: Optional[int] = None, initialModel: Optional[pyspark.mllib.clustering.GaussianMixtureModel] = None)pyspark.mllib.clustering.GaussianMixtureModel[source]

Train a Gaussian Mixture clustering model.

New in version 1.3.0.


Training points as an RDD of pyspark.mllib.linalg.Vector or convertible sequence types.


Number of independent Gaussians in the mixture model.

convergenceTolfloat, optional

Maximum change in log-likelihood at which convergence is considered to have occurred. (default: 1e-3)

maxIterationsint, optional

Maximum number of iterations allowed. (default: 100)

seedint, optional

Random seed for initial Gaussian distribution. Set as None to generate seed based on system time. (default: None)

initialModelGaussianMixtureModel, optional

Initial GMM starting point, bypassing the random initialization. (default: None)