public class BisectingKMeans
extends Object
implements org.apache.spark.internal.Logging
k
leaf clusters in total or no leaf clusters are divisible.
The bisecting steps of clusters on the same level are grouped together to increase parallelism.
If bisecting all divisible clusters on the bottom level would result more than k
leaf clusters,
larger clusters get higher priority.
param: k the desired number of leaf clusters (default: 4). The actual number could be smaller if there are no divisible leaf clusters. param: maxIterations the max number of k-means iterations to split clusters (default: 20) param: minDivisibleClusterSize the minimum number of points (if greater than or equal 1.0) or the minimum proportion of points (if less than 1.0) of a divisible cluster (default: 1) param: seed a random seed (default: hash value of the class name)
Constructor and Description |
---|
BisectingKMeans()
Constructs with the default configuration
|
Modifier and Type | Method and Description |
---|---|
String |
getDistanceMeasure()
The distance suite used by the algorithm.
|
int |
getK()
Gets the desired number of leaf clusters.
|
int |
getMaxIterations()
Gets the max number of k-means iterations to split clusters.
|
double |
getMinDivisibleClusterSize()
Gets the minimum number of points (if greater than or equal to
1.0 ) or the minimum proportion
of points (if less than 1.0 ) of a divisible cluster. |
long |
getSeed()
Gets the random seed.
|
BisectingKMeansModel |
run(JavaRDD<Vector> data)
Java-friendly version of
run() . |
BisectingKMeansModel |
run(RDD<Vector> input)
Runs the bisecting k-means algorithm.
|
BisectingKMeans |
setDistanceMeasure(String distanceMeasure)
Set the distance suite used by the algorithm.
|
BisectingKMeans |
setK(int k)
Sets the desired number of leaf clusters (default: 4).
|
BisectingKMeans |
setMaxIterations(int maxIterations)
Sets the max number of k-means iterations to split clusters (default: 20).
|
BisectingKMeans |
setMinDivisibleClusterSize(double minDivisibleClusterSize)
Sets the minimum number of points (if greater than or equal to
1.0 ) or the minimum proportion
of points (if less than 1.0 ) of a divisible cluster (default: 1). |
BisectingKMeans |
setSeed(long seed)
Sets the random seed (default: hash value of the class name).
|
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
$init$, initializeForcefully, initializeLogIfNecessary, initializeLogIfNecessary, initializeLogIfNecessary$default$2, initLock, isTraceEnabled, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning, org$apache$spark$internal$Logging$$log__$eq, org$apache$spark$internal$Logging$$log_, uninitialize
public BisectingKMeans()
public BisectingKMeans setK(int k)
k
- (undocumented)public int getK()
public BisectingKMeans setMaxIterations(int maxIterations)
maxIterations
- (undocumented)public int getMaxIterations()
public BisectingKMeans setMinDivisibleClusterSize(double minDivisibleClusterSize)
1.0
) or the minimum proportion
of points (if less than 1.0
) of a divisible cluster (default: 1).minDivisibleClusterSize
- (undocumented)public double getMinDivisibleClusterSize()
1.0
) or the minimum proportion
of points (if less than 1.0
) of a divisible cluster.public BisectingKMeans setSeed(long seed)
seed
- (undocumented)public long getSeed()
public String getDistanceMeasure()
public BisectingKMeans setDistanceMeasure(String distanceMeasure)
distanceMeasure
- (undocumented)public BisectingKMeansModel run(RDD<Vector> input)
input
- RDD of vectorspublic BisectingKMeansModel run(JavaRDD<Vector> data)
run()
.data
- (undocumented)