Class GaussianMixture
Object
org.apache.spark.mllib.clustering.GaussianMixture
- All Implemented Interfaces:
- Serializable
This class performs expectation maximization for multivariate Gaussian
 Mixture Models (GMMs).  A GMM represents a composite distribution of
 independent Gaussian distributions with associated "mixing" weights
 specifying each's contribution to the composite.
 
Given a set of sample points, this class will maximize the log-likelihood for a mixture of k Gaussians, iterating until the log-likelihood changes by less than convergenceTol, or until it has reached the max number of iterations. While this process is generally guaranteed to converge, it is not guaranteed to find a global optimum.
param: k Number of independent Gaussians in the mixture model. param: convergenceTol Maximum change in log-likelihood at which convergence is considered to have occurred. param: maxIterations Maximum number of iterations allowed.
- See Also:
- Note:
- This algorithm is limited in its number of features since it requires storing a covariance matrix which has size quadratic in the number of features. Even when the number of features does not exceed this limit, this algorithm may perform poorly on high-dimensional data. This is due to high-dimensional data (a) making it difficult to cluster at all (based on statistical/theoretical arguments) and (b) numerical issues with Gaussian distributions.
- 
Constructor SummaryConstructors
- 
Method SummaryModifier and TypeMethodDescriptiondoubleReturn the largest change in log-likelihood at which convergence is considered to have occurred.scala.Option<GaussianMixtureModel>Return the user supplied initial GMM, if suppliedintgetK()Return the number of Gaussians in the mixture modelintReturn the maximum number of iterations allowedlonggetSeed()Return the random seedJava-friendly version ofrun()Perform expectation maximizationsetConvergenceTol(double convergenceTol) Set the largest change in log-likelihood at which convergence is considered to have occurred.Set the initial GMM starting point, bypassing the random initialization.setK(int k) Set the number of Gaussians in the mixture model.setMaxIterations(int maxIterations) Set the maximum number of iterations allowed.setSeed(long seed) Set the random seedstatic booleanshouldDistributeGaussians(int k, int d) Heuristic to distribute the computation of theMultivariateGaussians, approximately when d is greater than 25 except for when k is very small.
- 
Constructor Details- 
GaussianMixturepublic GaussianMixture()Constructs a default instance. The default parameters are {k: 2, convergenceTol: 0.01, maxIterations: 100, seed: random}.
 
- 
- 
Method Details- 
shouldDistributeGaussianspublic static boolean shouldDistributeGaussians(int k, int d) Heuristic to distribute the computation of theMultivariateGaussians, approximately when d is greater than 25 except for when k is very small.- Parameters:
- k- Number of topics
- d- Number of features
- Returns:
- (undocumented)
 
- 
setInitialModelSet the initial GMM starting point, bypassing the random initialization. You must call setK() prior to calling this method, and the condition (model.k == this.k) must be met; failure will result in an IllegalArgumentException- Parameters:
- model- (undocumented)
- Returns:
- (undocumented)
 
- 
getInitialModelReturn the user supplied initial GMM, if supplied- Returns:
- (undocumented)
 
- 
setKSet the number of Gaussians in the mixture model. Default: 2- Parameters:
- k- (undocumented)
- Returns:
- (undocumented)
 
- 
getKpublic int getK()Return the number of Gaussians in the mixture model- Returns:
- (undocumented)
 
- 
setMaxIterationsSet the maximum number of iterations allowed. Default: 100- Parameters:
- maxIterations- (undocumented)
- Returns:
- (undocumented)
 
- 
getMaxIterationspublic int getMaxIterations()Return the maximum number of iterations allowed- Returns:
- (undocumented)
 
- 
setConvergenceTolSet the largest change in log-likelihood at which convergence is considered to have occurred.- Parameters:
- convergenceTol- (undocumented)
- Returns:
- (undocumented)
 
- 
getConvergenceTolpublic double getConvergenceTol()Return the largest change in log-likelihood at which convergence is considered to have occurred.- Returns:
- (undocumented)
 
- 
setSeedSet the random seed- Parameters:
- seed- (undocumented)
- Returns:
- (undocumented)
 
- 
getSeedpublic long getSeed()Return the random seed- Returns:
- (undocumented)
 
- 
runPerform expectation maximization- Parameters:
- data- (undocumented)
- Returns:
- (undocumented)
 
- 
runJava-friendly version ofrun()- Parameters:
- data- (undocumented)
- Returns:
- (undocumented)
 
 
-