KMeansSummary#

class pyspark.ml.clustering.KMeansSummary(java_obj=None)[source]#

Summary of KMeans.

New in version 2.1.0.

Attributes

cluster

DataFrame of predicted cluster centers for each training data point.

clusterSizes

Size of (number of data points in) each cluster.

featuresCol

Name for column of features in predictions.

k

The number of clusters the model was trained with.

numIter

Number of iterations.

predictionCol

Name for column of predicted clusters in predictions.

predictions

DataFrame produced by the model's transform method.

trainingCost

K-means cost (sum of squared distances to the nearest centroid for all points in the training dataset).

Attributes Documentation

cluster#

DataFrame of predicted cluster centers for each training data point.

New in version 2.1.0.

clusterSizes#

Size of (number of data points in) each cluster.

New in version 2.1.0.

featuresCol#

Name for column of features in predictions.

New in version 2.1.0.

k#

The number of clusters the model was trained with.

New in version 2.1.0.

numIter#

Number of iterations.

New in version 2.4.0.

predictionCol#

Name for column of predicted clusters in predictions.

New in version 2.1.0.

predictions#

DataFrame produced by the model’s transform method.

New in version 2.1.0.

trainingCost#

K-means cost (sum of squared distances to the nearest centroid for all points in the training dataset). This is equivalent to sklearn’s inertia.

New in version 2.4.0.