org.apache.spark.ml.clustering (Spark 4.0.0-preview1 JavaDoc)

package org.apache.spark.ml.clustering

Related Packages

Package

Description

org.apache.spark.ml

DataFrame-based machine learning APIs to let users quickly assemble and configure practical machine learning pipelines.
Class

Description

BisectingKMeans

A bisecting k-means algorithm based on the paper "A comparison of document clustering techniques" by Steinbach, Karypis, and Kumar, with modification to fit Spark.

BisectingKMeansModel

Model fitted by BisectingKMeans.

BisectingKMeansParams

Common params for BisectingKMeans and BisectingKMeansModel

BisectingKMeansSummary

Summary of BisectingKMeans.

ClusterData

Helper class for storing model data

ClusteringSummary

Summary of clustering algorithms.

DistributedLDAModel

Distributed model fitted by LDA.

ExpectationAggregator

ExpectationAggregator computes the partial expectation results.

GaussianMixture

Gaussian Mixture clustering.

GaussianMixtureModel

Multivariate Gaussian Mixture Model (GMM) consisting of k Gaussians, where points are drawn from each Gaussian i with probability weights(i).

GaussianMixtureParams

Common params for GaussianMixture and GaussianMixtureModel

GaussianMixtureSummary

Summary of GaussianMixture.

InternalKMeansModelWriter

A writer for KMeans that handles the "internal" (or default) format

KMeans

K-means clustering with support for k-means|| initialization proposed by Bahmani et al.

KMeansAggregator

KMeansAggregator computes the distances and updates the centers for blocks in sparse or dense matrix in an online fashion.

KMeansModel

Model fitted by KMeans.

KMeansParams

Common params for KMeans and KMeansModel

KMeansSummary

Summary of KMeans.

LDA

Latent Dirichlet Allocation (LDA), a topic model designed for text documents.

LDAModel

Model fitted by LDA.

LDAParams

LocalLDAModel

Local (non-distributed) model fitted by LDA.

PMMLKMeansModelWriter

A writer for KMeans that handles the "pmml" format

PowerIterationClustering

Power Iteration Clustering (PIC), a scalable graph clustering algorithm developed by Lin and Cohen.

PowerIterationClusteringParams

Common params for PowerIterationClustering

Package org.apache.spark.ml.clustering