Package org.apache.spark.mllib.clustering
package org.apache.spark.mllib.clustering
-
ClassDescriptionA bisecting k-means algorithm based on the paper "A comparison of document clustering techniques" by Steinbach, Karypis, and Kumar, with modification to fit Spark.Clustering model produced by
BisectingKMeans
.Distributed LDA model.Optimizer for EM algorithm which stores data + parameter graph, plus algorithm parameters.This class performs expectation maximization for multivariate Gaussian Mixture Models (GMMs).Multivariate Gaussian Mixture Model (GMM) consisting of k Gaussians, where points are drawn from each Gaussian i=1..k with probability w(i); mu(i) and sigma(i) are the respective mean and covariance for each Gaussian distribution i=1..k.K-means clustering with a k-means++ like initialization mode (the k-means|| algorithm by Bahmani et al).A clustering model for K-means.Latent Dirichlet Allocation (LDA), a topic model designed for text documents.Latent Dirichlet Allocation (LDA) model.An LDAOptimizer specifies which optimization/learning/inference algorithm to use, and it can hold optimizer-specific parameters for users to set.Utility methods for LDA.An utility object to run K-means locally.Local LDA model.An online optimizer for LDA.Power Iteration Clustering (PIC), a scalable graph clustering algorithm developed by Lin and Cohen.Cluster assignment.Model produced byPowerIterationClustering
.StreamingKMeans provides methods for configuring a streaming k-means analysis, training the model on streaming, and using the model to make predictions on streaming data.StreamingKMeansModel extends MLlib's KMeansModel for streaming algorithms, so it can keep track of a continuously updated weight associated with each cluster, and also update the model by doing a single iteration of the standard k-means algorithm.