Class BisectingKMeansModel

Object
org.apache.spark.mllib.clustering.BisectingKMeansModel
All Implemented Interfaces:
Serializable, org.apache.spark.internal.Logging, Saveable

public class BisectingKMeansModel extends Object implements Serializable, Saveable, org.apache.spark.internal.Logging
Clustering model produced by BisectingKMeans. The prediction is done level-by-level from the root node to a leaf node, and at each node among its children the closest to the input point is selected.

param: root the root node of the clustering tree

See Also:
  • Nested Class Summary

    Nested Classes
    Modifier and Type
    Class
    Description
    static class 
     
    static class 
     
    static class 
     

    Nested classes/interfaces inherited from interface org.apache.spark.internal.Logging

    org.apache.spark.internal.Logging.LogStringContext, org.apache.spark.internal.Logging.SparkShellLoggingFilter
  • Constructor Summary

    Constructors
    Constructor
    Description
    BisectingKMeansModel(org.apache.spark.mllib.clustering.ClusteringTreeNode root)
     
  • Method Summary

    Modifier and Type
    Method
    Description
    Leaf cluster centers.
    double
    Java-friendly version of computeCost().
    double
    Computes the squared distance between the input point and the cluster center it belongs to.
    double
    Computes the sum of squared distances between the input points and their corresponding cluster centers.
     
    int
    k()
     
     
    Java-friendly version of predict().
    int
    predict(Vector point)
    Predicts the index of the cluster that the input point belongs to.
    predict(RDD<Vector> points)
    Predicts the indices of the clusters that the input points belong to.
    void
    Save this model to the given path.
    double
     

    Methods inherited from class java.lang.Object

    equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

    Methods inherited from interface org.apache.spark.internal.Logging

    initializeForcefully, initializeLogIfNecessary, initializeLogIfNecessary, initializeLogIfNecessary$default$2, isTraceEnabled, log, logDebug, logDebug, logDebug, logDebug, logError, logError, logError, logError, logInfo, logInfo, logInfo, logInfo, logName, LogStringContext, logTrace, logTrace, logTrace, logTrace, logWarning, logWarning, logWarning, logWarning, org$apache$spark$internal$Logging$$log_, org$apache$spark$internal$Logging$$log__$eq, withLogContext
  • Constructor Details

    • BisectingKMeansModel

      public BisectingKMeansModel(org.apache.spark.mllib.clustering.ClusteringTreeNode root)
  • Method Details

    • load

      public static BisectingKMeansModel load(SparkContext sc, String path)
    • distanceMeasure

      public String distanceMeasure()
    • trainingCost

      public double trainingCost()
    • clusterCenters

      public Vector[] clusterCenters()
      Leaf cluster centers.
      Returns:
      (undocumented)
    • k

      public int k()
    • predict

      public int predict(Vector point)
      Predicts the index of the cluster that the input point belongs to.
      Parameters:
      point - (undocumented)
      Returns:
      (undocumented)
    • predict

      public RDD<Object> predict(RDD<Vector> points)
      Predicts the indices of the clusters that the input points belong to.
      Parameters:
      points - (undocumented)
      Returns:
      (undocumented)
    • predict

      public JavaRDD<Integer> predict(JavaRDD<Vector> points)
      Java-friendly version of predict().
      Parameters:
      points - (undocumented)
      Returns:
      (undocumented)
    • computeCost

      public double computeCost(Vector point)
      Computes the squared distance between the input point and the cluster center it belongs to.
      Parameters:
      point - (undocumented)
      Returns:
      (undocumented)
    • computeCost

      public double computeCost(RDD<Vector> data)
      Computes the sum of squared distances between the input points and their corresponding cluster centers.
      Parameters:
      data - (undocumented)
      Returns:
      (undocumented)
    • computeCost

      public double computeCost(JavaRDD<Vector> data)
      Java-friendly version of computeCost().
      Parameters:
      data - (undocumented)
      Returns:
      (undocumented)
    • save

      public void save(SparkContext sc, String path)
      Description copied from interface: Saveable
      Save this model to the given path.

      This saves: - human-readable (JSON) model metadata to path/metadata/ - Parquet formatted data to path/data/

      The model may be loaded using Loader.load.

      Specified by:
      save in interface Saveable
      Parameters:
      sc - Spark context used to save model data.
      path - Path specifying the directory in which to save this model. If the directory already exists, this method throws an exception.