public class GaussianMixture
extends java.lang.Object
implements scala.Serializable
This class performs expectation maximization for multivariate Gaussian Mixture Models (GMMs). A GMM represents a composite distribution of independent Gaussian distributions with associated "mixing" weights specifying each's contribution to the composite.
Given a set of sample points, this class will maximize the loglikelihood for a mixture of k Gaussians, iterating until the loglikelihood changes by less than convergenceTol, or until it has reached the max number of iterations. While this process is generally guaranteed to converge, it is not guaranteed to find a global optimum.
Note: For highdimensional data (with many features), this algorithm may perform poorly. This is due to highdimensional data (a) making it difficult to cluster at all (based on statistical/theoretical arguments) and (b) numerical issues with Gaussian distributions.
param: k The number of independent Gaussians in the mixture model param: convergenceTol The maximum change in loglikelihood at which convergence is considered to have occurred. param: maxIterations The maximum number of iterations to perform
Constructor and Description 

GaussianMixture()
Constructs a default instance.

Modifier and Type  Method and Description 

double 
getConvergenceTol()
Return the largest change in loglikelihood at which convergence is
considered to have occurred.

scala.Option<GaussianMixtureModel> 
getInitialModel()
Return the user supplied initial GMM, if supplied

int 
getK()
Return the number of Gaussians in the mixture model

int 
getMaxIterations()
Return the maximum number of iterations to run

long 
getSeed()
Return the random seed

GaussianMixtureModel 
run(JavaRDD<Vector> data)
Javafriendly version of
run() 
GaussianMixtureModel 
run(RDD<Vector> data)
Perform expectation maximization

GaussianMixture 
setConvergenceTol(double convergenceTol)
Set the largest change in loglikelihood at which convergence is
considered to have occurred.

GaussianMixture 
setInitialModel(GaussianMixtureModel model)
Set the initial GMM starting point, bypassing the random initialization.

GaussianMixture 
setK(int k)
Set the number of Gaussians in the mixture model.

GaussianMixture 
setMaxIterations(int maxIterations)
Set the maximum number of iterations to run.

GaussianMixture 
setSeed(long seed)
Set the random seed

static boolean 
shouldDistributeGaussians(int k,
int d)
Heuristic to distribute the computation of the
MultivariateGaussian s, approximately when
d > 25 except for when k is very small. 
public GaussianMixture()
public static boolean shouldDistributeGaussians(int k, int d)
MultivariateGaussian
s, approximately when
d > 25 except for when k is very small.k
 Number of topicsd
 Number of featurespublic GaussianMixture setInitialModel(GaussianMixtureModel model)
model
 (undocumented)public scala.Option<GaussianMixtureModel> getInitialModel()
public GaussianMixture setK(int k)
k
 (undocumented)public int getK()
public GaussianMixture setMaxIterations(int maxIterations)
maxIterations
 (undocumented)public int getMaxIterations()
public GaussianMixture setConvergenceTol(double convergenceTol)
convergenceTol
 (undocumented)public double getConvergenceTol()
public GaussianMixture setSeed(long seed)
seed
 (undocumented)public long getSeed()
public GaussianMixtureModel run(RDD<Vector> data)
data
 (undocumented)public GaussianMixtureModel run(JavaRDD<Vector> data)
run()
data
 (undocumented)