KMeansSummary#

class pyspark.ml.clustering.KMeansSummary(java_obj=None)[source]#

Summary of KMeans.

New in version 2.1.0.

Attributes

`cluster`	DataFrame of predicted cluster centers for each training data point.
`clusterSizes`	Size of (number of data points in) each cluster.
`featuresCol`	Name for column of features in predictions.
`k`	The number of clusters the model was trained with.
`numIter`	Number of iterations.
`predictionCol`	Name for column of predicted clusters in predictions.
`predictions`	DataFrame produced by the model's transform method.
`trainingCost`	K-means cost (sum of squared distances to the nearest centroid for all points in the training dataset).

Attributes Documentation

cluster#: DataFrame of predicted cluster centers for each training data point.

New in version 2.1.0.

clusterSizes#: Size of (number of data points in) each cluster.

New in version 2.1.0.

featuresCol#: Name for column of features in predictions.

New in version 2.1.0.

k#: The number of clusters the model was trained with.

New in version 2.1.0.

numIter#: Number of iterations.

New in version 2.4.0.

predictionCol#: Name for column of predicted clusters in predictions.

New in version 2.1.0.

predictions#: DataFrame produced by the model’s transform method.

New in version 2.1.0.

trainingCost#: K-means cost (sum of squared distances to the nearest centroid for all points in the training dataset). This is equivalent to sklearn’s inertia.

New in version 2.4.0.