Package org.apache.spark.ml.clustering
Interface KMeansParams
- All Superinterfaces:
HasDistanceMeasure
,HasFeaturesCol
,HasMaxBlockSizeInMB
,HasMaxIter
,HasPredictionCol
,HasSeed
,HasSolver
,HasTol
,HasWeightCol
,Identifiable
,Params
,Serializable
- All Known Implementing Classes:
KMeans
,KMeansModel
public interface KMeansParams
extends Params, HasMaxIter, HasFeaturesCol, HasSeed, HasPredictionCol, HasTol, HasDistanceMeasure, HasWeightCol, HasSolver, HasMaxBlockSizeInMB
Common params for KMeans and KMeansModel
-
Method Summary
Modifier and TypeMethodDescriptionint
int
getK()
initMode()
Param for the initialization algorithm.Param for the number of steps for the k-means|| initialization mode.k()
The number of clusters to create (k).solver()
Param for the name of optimization method used in KMeans.validateAndTransformSchema
(StructType schema) Validates and transforms the input schema.Methods inherited from interface org.apache.spark.ml.param.shared.HasDistanceMeasure
distanceMeasure, getDistanceMeasure
Methods inherited from interface org.apache.spark.ml.param.shared.HasFeaturesCol
featuresCol, getFeaturesCol
Methods inherited from interface org.apache.spark.ml.param.shared.HasMaxBlockSizeInMB
getMaxBlockSizeInMB, maxBlockSizeInMB
Methods inherited from interface org.apache.spark.ml.param.shared.HasMaxIter
getMaxIter, maxIter
Methods inherited from interface org.apache.spark.ml.param.shared.HasPredictionCol
getPredictionCol, predictionCol
Methods inherited from interface org.apache.spark.ml.param.shared.HasWeightCol
getWeightCol, weightCol
Methods inherited from interface org.apache.spark.ml.util.Identifiable
toString, uid
Methods inherited from interface org.apache.spark.ml.param.Params
clear, copy, copyValues, defaultCopy, defaultParamMap, explainParam, explainParams, extractParamMap, extractParamMap, get, getDefault, getOrDefault, getParam, hasDefault, hasParam, isDefined, isSet, onParamChange, paramMap, params, set, set, set, setDefault, setDefault, shouldOwn
-
Method Details
-
getInitMode
String getInitMode() -
getInitSteps
int getInitSteps() -
getK
int getK() -
initMode
Param for the initialization algorithm. This can be either "random" to choose random points as initial cluster centers, or "k-means||" to use a parallel variant of k-means++ (Bahmani et al., Scalable K-Means++, VLDB 2012). Default: k-means||.- Returns:
- (undocumented)
-
initSteps
IntParam initSteps()Param for the number of steps for the k-means|| initialization mode. This is an advanced setting -- the default of 2 is almost always enough. Must be > 0. Default: 2.- Returns:
- (undocumented)
-
k
IntParam k()The number of clusters to create (k). Must be > 1. Note that it is possible for fewer than k clusters to be returned, for example, if there are fewer than k distinct points to cluster. Default: 2.- Returns:
- (undocumented)
-
solver
Param for the name of optimization method used in KMeans. Supported options: - "auto": Automatically select the solver based on the input schema and sparsity: If input instances are arrays or input vectors are dense, set to "block". Else, set to "row". - "row": input instances are processed row by row, and triangle-inequality is applied to accelerate the training. - "block": input instances are stacked to blocks, and GEMM is applied to compute the distances. Default is "auto". -
validateAndTransformSchema
Validates and transforms the input schema.- Parameters:
schema
- input schema- Returns:
- output schema
-