org.apache.spark.mllib.clustering

GaussianMixture

class GaussianMixture extends Serializable

This class performs expectation maximization for multivariate Gaussian Mixture Models (GMMs). A GMM represents a composite distribution of independent Gaussian distributions with associated "mixing" weights specifying each's contribution to the composite.

Given a set of sample points, this class will maximize the log-likelihood for a mixture of k Gaussians, iterating until the log-likelihood changes by less than convergenceTol, or until it has reached the max number of iterations. While this process is generally guaranteed to converge, it is not guaranteed to find a global optimum.

Note: For high-dimensional data (with many features), this algorithm may perform poorly. This is due to high-dimensional data (a) making it difficult to cluster at all (based on statistical/theoretical arguments) and (b) numerical issues with Gaussian distributions.

Annotations
@Since( "1.3.0" )
Source
GaussianMixture.scala
Linear Supertypes
Serializable, Serializable, AnyRef, Any
Ordering
  1. Alphabetic
  2. By inheritance
Inherited
  1. GaussianMixture
  2. Serializable
  3. Serializable
  4. AnyRef
  5. Any
  1. Hide All
  2. Show all
Learn more about member selection
Visibility
  1. Public
  2. All

Instance Constructors

  1. new GaussianMixture()

    Constructs a default instance.

    Constructs a default instance. The default parameters are {k: 2, convergenceTol: 0.01, maxIterations: 100, seed: random}.

    Annotations
    @Since( "1.3.0" )

Value Members

  1. final def !=(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  2. final def !=(arg0: Any): Boolean

    Definition Classes
    Any
  3. final def ##(): Int

    Definition Classes
    AnyRef → Any
  4. final def ==(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  5. final def ==(arg0: Any): Boolean

    Definition Classes
    Any
  6. final def asInstanceOf[T0]: T0

    Definition Classes
    Any
  7. def clone(): AnyRef

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  8. final def eq(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  9. def equals(arg0: Any): Boolean

    Definition Classes
    AnyRef → Any
  10. def finalize(): Unit

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  11. final def getClass(): Class[_]

    Definition Classes
    AnyRef → Any
  12. def getConvergenceTol: Double

    Return the largest change in log-likelihood at which convergence is considered to have occurred.

    Return the largest change in log-likelihood at which convergence is considered to have occurred.

    Annotations
    @Since( "1.3.0" )
  13. def getInitialModel: Option[GaussianMixtureModel]

    Return the user supplied initial GMM, if supplied

    Return the user supplied initial GMM, if supplied

    Annotations
    @Since( "1.3.0" )
  14. def getK: Int

    Return the number of Gaussians in the mixture model

    Return the number of Gaussians in the mixture model

    Annotations
    @Since( "1.3.0" )
  15. def getMaxIterations: Int

    Return the maximum number of iterations to run

    Return the maximum number of iterations to run

    Annotations
    @Since( "1.3.0" )
  16. def getSeed: Long

    Return the random seed

    Return the random seed

    Annotations
    @Since( "1.3.0" )
  17. def hashCode(): Int

    Definition Classes
    AnyRef → Any
  18. final def isInstanceOf[T0]: Boolean

    Definition Classes
    Any
  19. final def ne(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  20. final def notify(): Unit

    Definition Classes
    AnyRef
  21. final def notifyAll(): Unit

    Definition Classes
    AnyRef
  22. def run(data: JavaRDD[Vector]): GaussianMixtureModel

    Java-friendly version of run()

    Java-friendly version of run()

    Annotations
    @Since( "1.3.0" )
  23. def run(data: RDD[Vector]): GaussianMixtureModel

    Perform expectation maximization

    Perform expectation maximization

    Annotations
    @Since( "1.3.0" )
  24. def setConvergenceTol(convergenceTol: Double): GaussianMixture.this.type

    Set the largest change in log-likelihood at which convergence is considered to have occurred.

    Set the largest change in log-likelihood at which convergence is considered to have occurred.

    Annotations
    @Since( "1.3.0" )
  25. def setInitialModel(model: GaussianMixtureModel): GaussianMixture.this.type

    Set the initial GMM starting point, bypassing the random initialization.

    Set the initial GMM starting point, bypassing the random initialization. You must call setK() prior to calling this method, and the condition (model.k == this.k) must be met; failure will result in an IllegalArgumentException

    Annotations
    @Since( "1.3.0" )
  26. def setK(k: Int): GaussianMixture.this.type

    Set the number of Gaussians in the mixture model.

    Set the number of Gaussians in the mixture model. Default: 2

    Annotations
    @Since( "1.3.0" )
  27. def setMaxIterations(maxIterations: Int): GaussianMixture.this.type

    Set the maximum number of iterations to run.

    Set the maximum number of iterations to run. Default: 100

    Annotations
    @Since( "1.3.0" )
  28. def setSeed(seed: Long): GaussianMixture.this.type

    Set the random seed

    Set the random seed

    Annotations
    @Since( "1.3.0" )
  29. final def synchronized[T0](arg0: ⇒ T0): T0

    Definition Classes
    AnyRef
  30. def toString(): String

    Definition Classes
    AnyRef → Any
  31. final def wait(): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  32. final def wait(arg0: Long, arg1: Int): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  33. final def wait(arg0: Long): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from Serializable

Inherited from Serializable

Inherited from AnyRef

Inherited from Any

Ungrouped