org.apache.spark.mllib.clustering (Spark 4.0.0-preview2 JavaDoc)

package org.apache.spark.mllib.clustering

Related Packages

Package

Description

org.apache.spark.mllib

RDD-based machine learning APIs (in maintenance mode).
Class

Description

BisectingKMeans

A bisecting k-means algorithm based on the paper "A comparison of document clustering techniques" by Steinbach, Karypis, and Kumar, with modification to fit Spark.

BisectingKMeansModel

Clustering model produced by BisectingKMeans.

BisectingKMeansModel.SaveLoadV1_0$

BisectingKMeansModel.SaveLoadV2_0$

BisectingKMeansModel.SaveLoadV3_0$

DistributedLDAModel

Distributed LDA model.

EMLDAOptimizer

Optimizer for EM algorithm which stores data + parameter graph, plus algorithm parameters.

ExpectationSum

GaussianMixture

This class performs expectation maximization for multivariate Gaussian Mixture Models (GMMs).

GaussianMixtureModel

Multivariate Gaussian Mixture Model (GMM) consisting of k Gaussians, where points are drawn from each Gaussian i=1..k with probability w(i); mu(i) and sigma(i) are the respective mean and covariance for each Gaussian distribution i=1..k.

KMeans

K-means clustering with a k-means++ like initialization mode (the k-means|| algorithm by Bahmani et al).

KMeansModel

A clustering model for K-means.

KMeansModel.Cluster$

KMeansModel.SaveLoadV1_0$

KMeansModel.SaveLoadV2_0$

LDA

Latent Dirichlet Allocation (LDA), a topic model designed for text documents.

LDAModel

Latent Dirichlet Allocation (LDA) model.

LDAOptimizer

An LDAOptimizer specifies which optimization/learning/inference algorithm to use, and it can hold optimizer-specific parameters for users to set.

LDAUtils

Utility methods for LDA.

LocalKMeans

An utility object to run K-means locally.

LocalLDAModel

Local LDA model.

OnlineLDAOptimizer

An online optimizer for LDA.

PowerIterationClustering

Power Iteration Clustering (PIC), a scalable graph clustering algorithm developed by Lin and Cohen.

PowerIterationClustering.Assignment

Cluster assignment.

PowerIterationClustering.Assignment$

PowerIterationClusteringModel

Model produced by PowerIterationClustering.

PowerIterationClusteringModel.SaveLoadV1_0$

StreamingKMeans

StreamingKMeans provides methods for configuring a streaming k-means analysis, training the model on streaming, and using the model to make predictions on streaming data.

StreamingKMeansModel

StreamingKMeansModel extends MLlib's KMeansModel for streaming algorithms, so it can keep track of a continuously updated weight associated with each cluster, and also update the model by doing a single iteration of the standard k-means algorithm.

Package org.apache.spark.mllib.clustering