Class ClusteringEvaluator

Object
org.apache.spark.ml.evaluation.Evaluator
org.apache.spark.ml.evaluation.ClusteringEvaluator
All Implemented Interfaces:
Serializable, Params, HasFeaturesCol, HasPredictionCol, HasWeightCol, DefaultParamsWritable, Identifiable, MLWritable

public class ClusteringEvaluator extends Evaluator implements HasPredictionCol, HasFeaturesCol, HasWeightCol, DefaultParamsWritable
Evaluator for clustering results. The metric computes the Silhouette measure using the specified distance measure.

The Silhouette is a measure for the validation of the consistency within clusters. It ranges between 1 and -1, where a value close to 1 means that the points in a cluster are close to the other points in the same cluster and far from the points of the other clusters.

See Also:
  • Constructor Details

    • ClusteringEvaluator

      public ClusteringEvaluator(String uid)
    • ClusteringEvaluator

      public ClusteringEvaluator()
  • Method Details

    • load

      public static ClusteringEvaluator load(String path)
    • read

      public static MLReader<T> read()
    • weightCol

      public final Param<String> weightCol()
      Description copied from interface: HasWeightCol
      Param for weight column name. If this is not set or empty, we treat all instance weights as 1.0.
      Specified by:
      weightCol in interface HasWeightCol
      Returns:
      (undocumented)
    • featuresCol

      public final Param<String> featuresCol()
      Description copied from interface: HasFeaturesCol
      Param for features column name.
      Specified by:
      featuresCol in interface HasFeaturesCol
      Returns:
      (undocumented)
    • predictionCol

      public final Param<String> predictionCol()
      Description copied from interface: HasPredictionCol
      Param for prediction column name.
      Specified by:
      predictionCol in interface HasPredictionCol
      Returns:
      (undocumented)
    • uid

      public String uid()
      Description copied from interface: Identifiable
      An immutable unique ID for the object and its derivatives.
      Specified by:
      uid in interface Identifiable
      Returns:
      (undocumented)
    • copy

      public ClusteringEvaluator copy(ParamMap pMap)
      Description copied from interface: Params
      Creates a copy of this instance with the same UID and some extra params. Subclasses should implement this method and set the return type properly. See defaultCopy().
      Specified by:
      copy in interface Params
      Specified by:
      copy in class Evaluator
      Parameters:
      pMap - (undocumented)
      Returns:
      (undocumented)
    • isLargerBetter

      public boolean isLargerBetter()
      Description copied from class: Evaluator
      Indicates whether the metric returned by evaluate should be maximized (true, default) or minimized (false). A given evaluator may support multiple metrics which may be maximized or minimized.
      Overrides:
      isLargerBetter in class Evaluator
      Returns:
      (undocumented)
    • setPredictionCol

      public ClusteringEvaluator setPredictionCol(String value)
    • setFeaturesCol

      public ClusteringEvaluator setFeaturesCol(String value)
    • setWeightCol

      public ClusteringEvaluator setWeightCol(String value)
    • metricName

      public Param<String> metricName()
      param for metric name in evaluation (supports "silhouette" (default))
      Returns:
      (undocumented)
    • getMetricName

      public String getMetricName()
    • setMetricName

      public ClusteringEvaluator setMetricName(String value)
    • distanceMeasure

      public Param<String> distanceMeasure()
      param for distance measure to be used in evaluation (supports "squaredEuclidean" (default), "cosine")
      Returns:
      (undocumented)
    • getDistanceMeasure

      public String getDistanceMeasure()
    • setDistanceMeasure

      public ClusteringEvaluator setDistanceMeasure(String value)
    • evaluate

      public double evaluate(Dataset<?> dataset)
      Description copied from class: Evaluator
      Evaluates model output and returns a scalar metric. The value of Evaluator.isLargerBetter() specifies whether larger values are better.

      Specified by:
      evaluate in class Evaluator
      Parameters:
      dataset - a dataset that contains labels/observations and predictions.
      Returns:
      metric
    • getMetrics

      public ClusteringMetrics getMetrics(Dataset<?> dataset)
      Get a ClusteringMetrics, which can be used to get clustering metrics such as silhouette score.

      Parameters:
      dataset - a dataset that contains labels/observations and predictions.
      Returns:
      ClusteringMetrics
    • toString

      public String toString()
      Specified by:
      toString in interface Identifiable
      Overrides:
      toString in class Object