KMeansSummary¶

class pyspark.ml.clustering.KMeansSummary(java_obj=None)[source]¶

Summary of KMeans.

New in version 2.1.0.

Attributes

`cluster`	DataFrame of predicted cluster centers for each training data point.
`clusterSizes`	Size of (number of data points in) each cluster.
`featuresCol`	Name for column of features in predictions.
`k`	The number of clusters the model was trained with.
`numIter`	Number of iterations.
`predictionCol`	Name for column of predicted clusters in predictions.
`predictions`	DataFrame produced by the model’s transform method.
`trainingCost`	K-means cost (sum of squared distances to the nearest centroid for all points in the training dataset).

Attributes Documentation

cluster¶: DataFrame of predicted cluster centers for each training data point.

New in version 2.1.0.

clusterSizes¶: Size of (number of data points in) each cluster.

New in version 2.1.0.

featuresCol¶: Name for column of features in predictions.

New in version 2.1.0.

k¶: The number of clusters the model was trained with.

New in version 2.1.0.

numIter¶: Number of iterations.

New in version 2.4.0.

predictionCol¶: Name for column of predicted clusters in predictions.

New in version 2.1.0.

predictions¶: DataFrame produced by the model’s transform method.

New in version 2.1.0.

trainingCost¶: K-means cost (sum of squared distances to the nearest centroid for all points in the training dataset). This is equivalent to sklearn’s inertia.

New in version 2.4.0.

KMeansModel GaussianMixture