public class GaussianMixture
extends Object
implements scala.Serializable
This class performs expectation maximization for multivariate Gaussian Mixture Models (GMMs). A GMM represents a composite distribution of independent Gaussian distributions with associated "mixing" weights specifying each's contribution to the composite.
Given a set of sample points, this class will maximize the log-likelihood for a mixture of k Gaussians, iterating until the log-likelihood changes by less than convergenceTol, or until it has reached the max number of iterations. While this process is generally guaranteed to converge, it is not guaranteed to find a global optimum.
Note: For high-dimensional data (with many features), this algorithm may perform poorly. This is due to high-dimensional data (a) making it difficult to cluster at all (based on statistical/theoretical arguments) and (b) numerical issues with Gaussian distributions.
Constructor and Description |
---|
GaussianMixture()
Constructs a default instance.
|
Modifier and Type | Method and Description |
---|---|
double |
getConvergenceTol()
Return the largest change in log-likelihood at which convergence is
considered to have occurred.
|
scala.Option<GaussianMixtureModel> |
getInitialModel()
Return the user supplied initial GMM, if supplied
|
int |
getK()
Return the number of Gaussians in the mixture model
|
int |
getMaxIterations()
Return the maximum number of iterations to run
|
long |
getSeed()
Return the random seed
|
GaussianMixtureModel |
run(RDD<Vector> data)
Perform expectation maximization
|
GaussianMixture |
setConvergenceTol(double convergenceTol)
Set the largest change in log-likelihood at which convergence is
considered to have occurred.
|
GaussianMixture |
setInitialModel(GaussianMixtureModel model)
Set the initial GMM starting point, bypassing the random initialization.
|
GaussianMixture |
setK(int k)
Set the number of Gaussians in the mixture model.
|
GaussianMixture |
setMaxIterations(int maxIterations)
Set the maximum number of iterations to run.
|
GaussianMixture |
setSeed(long seed)
Set the random seed
|
public GaussianMixture()
public GaussianMixture setInitialModel(GaussianMixtureModel model)
public scala.Option<GaussianMixtureModel> getInitialModel()
public GaussianMixture setK(int k)
public int getK()
public GaussianMixture setMaxIterations(int maxIterations)
public int getMaxIterations()
public GaussianMixture setConvergenceTol(double convergenceTol)
public double getConvergenceTol()
public GaussianMixture setSeed(long seed)
public long getSeed()
public GaussianMixtureModel run(RDD<Vector> data)