Class BisectingKMeansModel

Object
org.apache.spark.mllib.clustering.BisectingKMeansModel
All Implemented Interfaces:
Serializable, org.apache.spark.internal.Logging, Saveable, scala.Serializable

public class BisectingKMeansModel extends Object implements scala.Serializable, Saveable, org.apache.spark.internal.Logging
Clustering model produced by BisectingKMeans. The prediction is done level-by-level from the root node to a leaf node, and at each node among its children the closest to the input point is selected.

param: root the root node of the clustering tree

See Also:
  • Constructor Details

    • BisectingKMeansModel

      public BisectingKMeansModel(org.apache.spark.mllib.clustering.ClusteringTreeNode root)
  • Method Details

    • load

      public static BisectingKMeansModel load(SparkContext sc, String path)
    • distanceMeasure

      public String distanceMeasure()
    • trainingCost

      public double trainingCost()
    • clusterCenters

      public Vector[] clusterCenters()
      Leaf cluster centers.
      Returns:
      (undocumented)
    • k

      public int k()
    • predict

      public int predict(Vector point)
      Predicts the index of the cluster that the input point belongs to.
      Parameters:
      point - (undocumented)
      Returns:
      (undocumented)
    • predict

      public RDD<Object> predict(RDD<Vector> points)
      Predicts the indices of the clusters that the input points belong to.
      Parameters:
      points - (undocumented)
      Returns:
      (undocumented)
    • predict

      public JavaRDD<Integer> predict(JavaRDD<Vector> points)
      Java-friendly version of predict().
      Parameters:
      points - (undocumented)
      Returns:
      (undocumented)
    • computeCost

      public double computeCost(Vector point)
      Computes the squared distance between the input point and the cluster center it belongs to.
      Parameters:
      point - (undocumented)
      Returns:
      (undocumented)
    • computeCost

      public double computeCost(RDD<Vector> data)
      Computes the sum of squared distances between the input points and their corresponding cluster centers.
      Parameters:
      data - (undocumented)
      Returns:
      (undocumented)
    • computeCost

      public double computeCost(JavaRDD<Vector> data)
      Java-friendly version of computeCost().
      Parameters:
      data - (undocumented)
      Returns:
      (undocumented)
    • save

      public void save(SparkContext sc, String path)
      Description copied from interface: Saveable
      Save this model to the given path.

      This saves: - human-readable (JSON) model metadata to path/metadata/ - Parquet formatted data to path/data/

      The model may be loaded using Loader.load.

      Specified by:
      save in interface Saveable
      Parameters:
      sc - Spark context used to save model data.
      path - Path specifying the directory in which to save this model. If the directory already exists, this method throws an exception.