Class GaussianMixture
Object
org.apache.spark.mllib.clustering.GaussianMixture
- All Implemented Interfaces:
Serializable
This class performs expectation maximization for multivariate Gaussian
Mixture Models (GMMs). A GMM represents a composite distribution of
independent Gaussian distributions with associated "mixing" weights
specifying each's contribution to the composite.
Given a set of sample points, this class will maximize the log-likelihood for a mixture of k Gaussians, iterating until the log-likelihood changes by less than convergenceTol, or until it has reached the max number of iterations. While this process is generally guaranteed to converge, it is not guaranteed to find a global optimum.
param: k Number of independent Gaussians in the mixture model. param: convergenceTol Maximum change in log-likelihood at which convergence is considered to have occurred. param: maxIterations Maximum number of iterations allowed.
- See Also:
- Note:
- This algorithm is limited in its number of features since it requires storing a covariance matrix which has size quadratic in the number of features. Even when the number of features does not exceed this limit, this algorithm may perform poorly on high-dimensional data. This is due to high-dimensional data (a) making it difficult to cluster at all (based on statistical/theoretical arguments) and (b) numerical issues with Gaussian distributions.
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptiondouble
Return the largest change in log-likelihood at which convergence is considered to have occurred.scala.Option<GaussianMixtureModel>
Return the user supplied initial GMM, if suppliedint
getK()
Return the number of Gaussians in the mixture modelint
Return the maximum number of iterations allowedlong
getSeed()
Return the random seedJava-friendly version ofrun()
Perform expectation maximizationsetConvergenceTol
(double convergenceTol) Set the largest change in log-likelihood at which convergence is considered to have occurred.Set the initial GMM starting point, bypassing the random initialization.setK
(int k) Set the number of Gaussians in the mixture model.setMaxIterations
(int maxIterations) Set the maximum number of iterations allowed.setSeed
(long seed) Set the random seedstatic boolean
shouldDistributeGaussians
(int k, int d) Heuristic to distribute the computation of theMultivariateGaussian
s, approximately when d is greater than 25 except for when k is very small.
-
Constructor Details
-
GaussianMixture
public GaussianMixture()Constructs a default instance. The default parameters are {k: 2, convergenceTol: 0.01, maxIterations: 100, seed: random}.
-
-
Method Details
-
shouldDistributeGaussians
public static boolean shouldDistributeGaussians(int k, int d) Heuristic to distribute the computation of theMultivariateGaussian
s, approximately when d is greater than 25 except for when k is very small.- Parameters:
k
- Number of topicsd
- Number of features- Returns:
- (undocumented)
-
setInitialModel
Set the initial GMM starting point, bypassing the random initialization. You must call setK() prior to calling this method, and the condition (model.k == this.k) must be met; failure will result in an IllegalArgumentException- Parameters:
model
- (undocumented)- Returns:
- (undocumented)
-
getInitialModel
Return the user supplied initial GMM, if supplied- Returns:
- (undocumented)
-
setK
Set the number of Gaussians in the mixture model. Default: 2- Parameters:
k
- (undocumented)- Returns:
- (undocumented)
-
getK
public int getK()Return the number of Gaussians in the mixture model- Returns:
- (undocumented)
-
setMaxIterations
Set the maximum number of iterations allowed. Default: 100- Parameters:
maxIterations
- (undocumented)- Returns:
- (undocumented)
-
getMaxIterations
public int getMaxIterations()Return the maximum number of iterations allowed- Returns:
- (undocumented)
-
setConvergenceTol
Set the largest change in log-likelihood at which convergence is considered to have occurred.- Parameters:
convergenceTol
- (undocumented)- Returns:
- (undocumented)
-
getConvergenceTol
public double getConvergenceTol()Return the largest change in log-likelihood at which convergence is considered to have occurred.- Returns:
- (undocumented)
-
setSeed
Set the random seed- Parameters:
seed
- (undocumented)- Returns:
- (undocumented)
-
getSeed
public long getSeed()Return the random seed- Returns:
- (undocumented)
-
run
Perform expectation maximization- Parameters:
data
- (undocumented)- Returns:
- (undocumented)
-
run
Java-friendly version ofrun()
- Parameters:
data
- (undocumented)- Returns:
- (undocumented)
-