Class GaussianMixture
Object
org.apache.spark.mllib.clustering.GaussianMixture
 All Implemented Interfaces:
Serializable
,scala.Serializable
This class performs expectation maximization for multivariate Gaussian
Mixture Models (GMMs). A GMM represents a composite distribution of
independent Gaussian distributions with associated "mixing" weights
specifying each's contribution to the composite.
Given a set of sample points, this class will maximize the loglikelihood for a mixture of k Gaussians, iterating until the loglikelihood changes by less than convergenceTol, or until it has reached the max number of iterations. While this process is generally guaranteed to converge, it is not guaranteed to find a global optimum.
param: k Number of independent Gaussians in the mixture model. param: convergenceTol Maximum change in loglikelihood at which convergence is considered to have occurred. param: maxIterations Maximum number of iterations allowed.
 See Also:
 Note:
 This algorithm is limited in its number of features since it requires storing a covariance matrix which has size quadratic in the number of features. Even when the number of features does not exceed this limit, this algorithm may perform poorly on highdimensional data. This is due to highdimensional data (a) making it difficult to cluster at all (based on statistical/theoretical arguments) and (b) numerical issues with Gaussian distributions.

Constructor Summary

Method Summary
Modifier and TypeMethodDescriptiondouble
Return the largest change in loglikelihood at which convergence is considered to have occurred.scala.Option<GaussianMixtureModel>
Return the user supplied initial GMM, if suppliedint
getK()
Return the number of Gaussians in the mixture modelint
Return the maximum number of iterations allowedlong
getSeed()
Return the random seedJavafriendly version ofrun()
Perform expectation maximizationsetConvergenceTol
(double convergenceTol) Set the largest change in loglikelihood at which convergence is considered to have occurred.Set the initial GMM starting point, bypassing the random initialization.setK
(int k) Set the number of Gaussians in the mixture model.setMaxIterations
(int maxIterations) Set the maximum number of iterations allowed.setSeed
(long seed) Set the random seedstatic boolean
shouldDistributeGaussians
(int k, int d) Heuristic to distribute the computation of theMultivariateGaussian
s, approximately when d is greater than 25 except for when k is very small.

Constructor Details

GaussianMixture
public GaussianMixture()Constructs a default instance. The default parameters are {k: 2, convergenceTol: 0.01, maxIterations: 100, seed: random}.


Method Details

shouldDistributeGaussians
public static boolean shouldDistributeGaussians(int k, int d) Heuristic to distribute the computation of theMultivariateGaussian
s, approximately when d is greater than 25 except for when k is very small. Parameters:
k
 Number of topicsd
 Number of features Returns:
 (undocumented)

setInitialModel
Set the initial GMM starting point, bypassing the random initialization. You must call setK() prior to calling this method, and the condition (model.k == this.k) must be met; failure will result in an IllegalArgumentException Parameters:
model
 (undocumented) Returns:
 (undocumented)

getInitialModel
Return the user supplied initial GMM, if supplied Returns:
 (undocumented)

setK
Set the number of Gaussians in the mixture model. Default: 2 Parameters:
k
 (undocumented) Returns:
 (undocumented)

getK
public int getK()Return the number of Gaussians in the mixture model Returns:
 (undocumented)

setMaxIterations
Set the maximum number of iterations allowed. Default: 100 Parameters:
maxIterations
 (undocumented) Returns:
 (undocumented)

getMaxIterations
public int getMaxIterations()Return the maximum number of iterations allowed Returns:
 (undocumented)

setConvergenceTol
Set the largest change in loglikelihood at which convergence is considered to have occurred. Parameters:
convergenceTol
 (undocumented) Returns:
 (undocumented)

getConvergenceTol
public double getConvergenceTol()Return the largest change in loglikelihood at which convergence is considered to have occurred. Returns:
 (undocumented)

setSeed
Set the random seed Parameters:
seed
 (undocumented) Returns:
 (undocumented)

getSeed
public long getSeed()Return the random seed Returns:
 (undocumented)

run
Perform expectation maximization Parameters:
data
 (undocumented) Returns:
 (undocumented)

run
Javafriendly version ofrun()
 Parameters:
data
 (undocumented) Returns:
 (undocumented)
