Package org.apache.spark.mllib.clustering
package org.apache.spark.mllib.clustering

ClassDescriptionA bisecting kmeans algorithm based on the paper "A comparison of document clustering techniques" by Steinbach, Karypis, and Kumar, with modification to fit Spark.Clustering model produced by
BisectingKMeans
.Distributed LDA model.Optimizer for EM algorithm which stores data + parameter graph, plus algorithm parameters.This class performs expectation maximization for multivariate Gaussian Mixture Models (GMMs).Multivariate Gaussian Mixture Model (GMM) consisting of k Gaussians, where points are drawn from each Gaussian i=1..k with probability w(i); mu(i) and sigma(i) are the respective mean and covariance for each Gaussian distribution i=1..k.Kmeans clustering with a kmeans++ like initialization mode (the kmeans algorithm by Bahmani et al).A clustering model for Kmeans.Latent Dirichlet Allocation (LDA), a topic model designed for text documents.Latent Dirichlet Allocation (LDA) model.An LDAOptimizer specifies which optimization/learning/inference algorithm to use, and it can hold optimizerspecific parameters for users to set.Utility methods for LDA.An utility object to run Kmeans locally.Local LDA model.An online optimizer for LDA.Power Iteration Clustering (PIC), a scalable graph clustering algorithm developed by Lin and Cohen.Cluster assignment.Model produced byPowerIterationClustering
.StreamingKMeans provides methods for configuring a streaming kmeans analysis, training the model on streaming, and using the model to make predictions on streaming data.StreamingKMeansModel extends MLlib's KMeansModel for streaming algorithms, so it can keep track of a continuously updated weight associated with each cluster, and also update the model by doing a single iteration of the standard kmeans algorithm.