org.apache.spark.mllib.clustering
Class KMeansModel

Object
  extended by org.apache.spark.mllib.clustering.KMeansModel
All Implemented Interfaces:
java.io.Serializable, PMMLExportable, Saveable
Direct Known Subclasses:
StreamingKMeansModel

public class KMeansModel
extends Object
implements Saveable, scala.Serializable, PMMLExportable

A clustering model for K-means. Each point belongs to the cluster with the closest center.

See Also:
Serialized Form

Constructor Summary
KMeansModel(Iterable<Vector> centers)
          A Java-friendly constructor that takes an Iterable of Vectors.
KMeansModel(Vector[] clusterCenters)
           
 
Method Summary
 Vector[] clusterCenters()
           
 double computeCost(RDD<Vector> data)
          Return the K-means cost (sum of squared distances of points to their nearest center) for this model on the given data.
 int k()
          Total number of clusters.
static KMeansModel load(SparkContext sc, String path)
           
 JavaRDD<Integer> predict(JavaRDD<Vector> points)
          Maps given points to their cluster indices.
 RDD<Object> predict(RDD<Vector> points)
          Maps given points to their cluster indices.
 int predict(Vector point)
          Returns the cluster index that a given point belongs to.
 void save(SparkContext sc, String path)
          Save this model to the given path.
 
Methods inherited from class Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface org.apache.spark.mllib.pmml.PMMLExportable
toPMML, toPMML, toPMML, toPMML, toPMML
 

Constructor Detail

KMeansModel

public KMeansModel(Vector[] clusterCenters)

KMeansModel

public KMeansModel(Iterable<Vector> centers)
A Java-friendly constructor that takes an Iterable of Vectors.

Method Detail

load

public static KMeansModel load(SparkContext sc,
                               String path)

clusterCenters

public Vector[] clusterCenters()

k

public int k()
Total number of clusters.


predict

public int predict(Vector point)
Returns the cluster index that a given point belongs to.


predict

public RDD<Object> predict(RDD<Vector> points)
Maps given points to their cluster indices.


predict

public JavaRDD<Integer> predict(JavaRDD<Vector> points)
Maps given points to their cluster indices.


computeCost

public double computeCost(RDD<Vector> data)
Return the K-means cost (sum of squared distances of points to their nearest center) for this model on the given data.

Parameters:
data - (undocumented)
Returns:
(undocumented)

save

public void save(SparkContext sc,
                 String path)
Description copied from interface: Saveable
Save this model to the given path.

This saves: - human-readable (JSON) model metadata to path/metadata/ - Parquet formatted data to path/data/

The model may be loaded using Loader.load.

Specified by:
save in interface Saveable
Parameters:
sc - Spark context used to save model data.
path - Path specifying the directory in which to save this model. If the directory already exists, this method throws an exception.