BisectingKMeansModel¶

class
pyspark.mllib.clustering.
BisectingKMeansModel
(java_model)[source]¶ A clustering model derived from the bisecting kmeans method.
New in version 2.0.0.
Examples
>>> data = array([0.0,0.0, 1.0,1.0, 9.0,8.0, 8.0,9.0]).reshape(4, 2) >>> bskm = BisectingKMeans() >>> model = bskm.train(sc.parallelize(data, 2), k=4) >>> p = array([0.0, 0.0]) >>> model.predict(p) 0 >>> model.k 4 >>> model.computeCost(p) 0.0
Methods
call
(name, *a)Call method of java_model
computeCost
(x)Return the Bisecting Kmeans cost (sum of squared distances of points to their nearest center) for this model on the given data.
predict
(x)Find the cluster that each of the points belongs to in this model.
Attributes
Get the cluster centers, represented as a list of NumPy arrays.
Get the number of clusters
Methods Documentation

call
(name, *a)¶ Call method of java_model

computeCost
(x)[source]¶ Return the Bisecting Kmeans cost (sum of squared distances of points to their nearest center) for this model on the given data. If provided with an RDD of points returns the sum.
New in version 2.0.0.
 Parameters:
 point
pyspark.mllib.linalg.Vector
orpyspark.RDD
A data point (or RDD of points) to compute the cost(s).
pyspark.mllib.linalg.Vector
can be replaced with equivalent objects (list, tuple, numpy.ndarray).
 point

predict
(x)[source]¶ Find the cluster that each of the points belongs to in this model.
New in version 2.0.0.
 Parameters:
 x
pyspark.mllib.linalg.Vector
orpyspark.RDD
A data point (or RDD of points) to determine cluster index.
pyspark.mllib.linalg.Vector
can be replaced with equivalent objects (list, tuple, numpy.ndarray).
 x
 Returns:
 int or
pyspark.RDD
of int Predicted cluster index or an RDD of predicted cluster indices if the input is an RDD.
 int or
Attributes Documentation

clusterCenters
¶ Get the cluster centers, represented as a list of NumPy arrays.
New in version 2.0.0.

k
¶ Get the number of clusters
New in version 2.0.0.
