Class ProbabilisticClassificationModel<FeaturesType,M extends ProbabilisticClassificationModel<FeaturesType,M>>

Object
org.apache.spark.ml.PipelineStage
org.apache.spark.ml.Transformer
org.apache.spark.ml.Model<M>
org.apache.spark.ml.PredictionModel<FeaturesType,M>
org.apache.spark.ml.classification.ClassificationModel<FeaturesType,M>
org.apache.spark.ml.classification.ProbabilisticClassificationModel<FeaturesType,M>
Type Parameters:
FeaturesType - Type of input features. E.g., Vector
M - Concrete Model type
All Implemented Interfaces:
Serializable, org.apache.spark.internal.Logging, ClassifierParams, ProbabilisticClassifierParams, Params, HasFeaturesCol, HasLabelCol, HasPredictionCol, HasProbabilityCol, HasRawPredictionCol, HasThresholds, PredictorParams, Identifiable, scala.Serializable
Direct Known Subclasses:
DecisionTreeClassificationModel, FMClassificationModel, GBTClassificationModel, LogisticRegressionModel, MultilayerPerceptronClassificationModel, NaiveBayesModel, RandomForestClassificationModel

public abstract class ProbabilisticClassificationModel<FeaturesType,M extends ProbabilisticClassificationModel<FeaturesType,M>> extends ClassificationModel<FeaturesType,M> implements ProbabilisticClassifierParams
Model produced by a ProbabilisticClassifier. Classes are indexed {0, 1, ..., numClasses - 1}.

See Also:
  • Constructor Details

    • ProbabilisticClassificationModel

      public ProbabilisticClassificationModel()
  • Method Details

    • normalizeToProbabilitiesInPlace

      public static void normalizeToProbabilitiesInPlace(DenseVector v)
      Normalize a vector of raw predictions to be a multinomial probability vector, in place.

      The input raw predictions should be nonnegative. The output vector sums to 1.

      NOTE: This is NOT applicable to all models, only ones which effectively use class instance counts for raw predictions.

      Parameters:
      v - (undocumented)
      Throws:
      IllegalArgumentException - if the input vector is all-0 or including negative values
    • thresholds

      public DoubleArrayParam thresholds()
      Description copied from interface: HasThresholds
      Param for Thresholds in multi-class classification to adjust the probability of predicting each class. Array must have length equal to the number of classes, with values &gt; 0 excepting that at most one value may be 0. The class with largest value p/t is predicted, where p is the original probability of that class and t is the class's threshold.
      Specified by:
      thresholds in interface HasThresholds
      Returns:
      (undocumented)
    • probabilityCol

      public final Param<String> probabilityCol()
      Description copied from interface: HasProbabilityCol
      Param for Column name for predicted class conditional probabilities. Note: Not all models output well-calibrated probability estimates! These probabilities should be treated as confidences, not precise probabilities.
      Specified by:
      probabilityCol in interface HasProbabilityCol
      Returns:
      (undocumented)
    • setProbabilityCol

      public M setProbabilityCol(String value)
    • setThresholds

      public M setThresholds(double[] value)
    • transformSchema

      public StructType transformSchema(StructType schema)
      Description copied from class: PipelineStage
      Check transform validity and derive the output schema from the input schema.

      We check validity for interactions between parameters during transformSchema and raise an exception if any parameter value is invalid. Parameter value checks which do not depend on other parameters are handled by Param.validate().

      Typical implementation should first conduct verification on schema change and parameter validity, including complex parameter interaction checks.

      Overrides:
      transformSchema in class ClassificationModel<FeaturesType,M extends ProbabilisticClassificationModel<FeaturesType,M>>
      Parameters:
      schema - (undocumented)
      Returns:
      (undocumented)
    • transform

      public Dataset<Row> transform(Dataset<?> dataset)
      Transforms dataset by reading from PredictionModel.featuresCol(), and appending new columns as specified by parameters: - predicted labels as PredictionModel.predictionCol() of type Double - raw predictions (confidences) as ClassificationModel.rawPredictionCol() of type Vector - probability of each class as probabilityCol() of type Vector.

      Overrides:
      transform in class ClassificationModel<FeaturesType,M extends ProbabilisticClassificationModel<FeaturesType,M>>
      Parameters:
      dataset - input dataset
      Returns:
      transformed dataset
    • predictProbability

      public Vector predictProbability(FeaturesType features)
      Predict the probability of each class given the features. These predictions are also called class conditional probabilities.

      This internal method is used to implement transform() and output probabilityCol().

      Parameters:
      features - (undocumented)
      Returns:
      Estimated class conditional probabilities