org.apache.spark.mllib.clustering.BisectingKMeansModel

All Implemented Interfaces:: Serializable, org.apache.spark.internal.Logging, Saveable

public class BisectingKMeansModel extends Object implements Serializable, Saveable, org.apache.spark.internal.Logging

Clustering model produced by BisectingKMeans. The prediction is done level-by-level from the root node to a leaf node, and at each node among its children the closest to the input point is selected.

param: root the root node of the clustering tree

See Also:

Serialized Form

Nested Class Summary

Nested Classes

Modifier and Type

Class

Description

static class

BisectingKMeansModel.SaveLoadV1_0$

static class

BisectingKMeansModel.SaveLoadV2_0$

static class

BisectingKMeansModel.SaveLoadV3_0$

Nested classes/interfaces inherited from interface org.apache.spark.internal.Logging
org.apache.spark.internal.Logging.LogStringContext, org.apache.spark.internal.Logging.SparkShellLoggingFilter
Constructor Summary

Constructors

Constructor

Description

BisectingKMeansModel(org.apache.spark.mllib.clustering.ClusteringTreeNode root)
Method Summary

Modifier and Type

Method

Description

Vector[]

clusterCenters()

Leaf cluster centers.

double

computeCost(JavaRDD<Vector> data)

Java-friendly version of computeCost().

double

computeCost(Vector point)

Computes the squared distance between the input point and the cluster center it belongs to.

double

computeCost(RDD<Vector> data)

Computes the sum of squared distances between the input points and their corresponding cluster centers.

String

distanceMeasure()

int

k()

static BisectingKMeansModel

load(SparkContext sc, String path)

JavaRDD<Integer>

predict(JavaRDD<Vector> points)

Java-friendly version of predict().

int

predict(Vector point)

Predicts the index of the cluster that the input point belongs to.

RDD<Object>

predict(RDD<Vector> points)

Predicts the indices of the clusters that the input points belong to.

void

save(SparkContext sc, String path)

Save this model to the given path.

double

trainingCost()

Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface org.apache.spark.internal.Logging
initializeForcefully, initializeLogIfNecessary, initializeLogIfNecessary, initializeLogIfNecessary$default$2, isTraceEnabled, log, logBasedOnLevel, logDebug, logDebug, logDebug, logDebug, logError, logError, logError, logError, logInfo, logInfo, logInfo, logInfo, logName, LogStringContext, logTrace, logTrace, logTrace, logTrace, logWarning, logWarning, logWarning, logWarning, MDC, org$apache$spark$internal$Logging$$log_, org$apache$spark$internal$Logging$$log__$eq, withLogContext

Constructor Details
- BisectingKMeansModel
  
  public BisectingKMeansModel(org.apache.spark.mllib.clustering.ClusteringTreeNode root)
Method Details
- load
  
  public static BisectingKMeansModel load(SparkContext sc, String path)
- distanceMeasure
  
  public String distanceMeasure()
- trainingCost
  
  public double trainingCost()
- clusterCenters
  
  public Vector[] clusterCenters()
  
  Leaf cluster centers.
  
  Returns:
  
  (undocumented)
- k
  
  public int k()
- predict
  
  public int predict(Vector point)
  
  Predicts the index of the cluster that the input point belongs to.
  
  Parameters:
  
  point - (undocumented)
  
  Returns:
  
  (undocumented)
- predict
  
  public RDD<Object> predict(RDD<Vector> points)
  
  Predicts the indices of the clusters that the input points belong to.
  
  Parameters:
  
  points - (undocumented)
  
  Returns:
  
  (undocumented)
- predict
  
  public JavaRDD<Integer> predict(JavaRDD<Vector> points)
  
  Java-friendly version of predict().
  
  Parameters:
  
  points - (undocumented)
  
  Returns:
  
  (undocumented)
- computeCost
  
  public double computeCost(Vector point)
  
  Computes the squared distance between the input point and the cluster center it belongs to.
  
  Parameters:
  
  point - (undocumented)
  
  Returns:
  
  (undocumented)
- computeCost
  
  public double computeCost(RDD<Vector> data)
  
  Computes the sum of squared distances between the input points and their corresponding cluster centers.
  
  Parameters:
  
  data - (undocumented)
  
  Returns:
  
  (undocumented)
- computeCost
  
  public double computeCost(JavaRDD<Vector> data)
  
  Java-friendly version of computeCost().
  
  Parameters:
  
  data - (undocumented)
  
  Returns:
  
  (undocumented)
- save
  
  public void save(SparkContext sc, String path)
  
  Description copied from interface: Saveable
  
  Save this model to the given path.
  This saves: - human-readable (JSON) model metadata to path/metadata/ - Parquet formatted data to path/data/
  The model may be loaded using Loader.load.
  
  Specified by:
  
  save in interface Saveable
  
  Parameters:
  
  sc - Spark context used to save model data.
  
  path - Path specifying the directory in which to save this model. If the directory already exists, this method throws an exception.

Class BisectingKMeansModel

Nested Class Summary

Nested classes/interfaces inherited from interface org.apache.spark.internal.Logging

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Methods inherited from interface org.apache.spark.internal.Logging

Constructor Details

BisectingKMeansModel

Method Details

load

distanceMeasure

trainingCost

clusterCenters

k

predict

predict

predict

computeCost

computeCost

computeCost

save