BinaryClassificationMetrics (Spark 3.4.0 JavaDoc)

Object
- org.apache.spark.mllib.evaluation.BinaryClassificationMetrics

All Implemented Interfaces:

org.apache.spark.internal.Logging
```
public class BinaryClassificationMetrics
extends Object
implements org.apache.spark.internal.Logging
```
Evaluator for binary classification.
param: scoreAndLabels an RDD of (score, label) or (score, label, weight) tuples. param: numBins if greater than 0, then the curves (ROC curve, PR curve) computed internally will be down-sampled to this many "bins". If 0, no down-sampling will occur. This is useful because the curve contains a point for each distinct score in the input, and this could be as large as the input itself -- millions of points or more, when thousands may be entirely sufficient to summarize the curve. After down-sampling, the curves will instead be made of approximately numBins points instead. Points are made from bins of equal numbers of consecutive points. The size of each bin is floor(scoreAndLabels.count() / numBins), which means the resulting number of bins may not exactly equal numBins. The last bin in each partition may be smaller as a result, meaning there may be an extra sample at partition boundaries.

Nested Class Summary
- Nested classes/interfaces inherited from interface org.apache.spark.internal.Logging
  org.apache.spark.internal.Logging.SparkShellLoggingFilter

Constructor Summary

Constructors
Constructor and Description
`BinaryClassificationMetrics(RDD<? extends scala.Product> scoreAndLabels, int numBins)`
`BinaryClassificationMetrics(RDD<scala.Tuple2<Object,Object>> scoreAndLabels)` Defaults `numBins` to 0.

Method Summary

All Methods Instance Methods Concrete Methods Deprecated Methods
Modifier and Type	Method and Description
`double`	`areaUnderPR()` Computes the area under the precision-recall curve.
`double`	`areaUnderROC()` Computes the area under the receiver operating characteristic (ROC) curve.
`RDD<scala.Tuple2<Object,Object>>`	`fMeasureByThreshold()` Returns the (threshold, F-Measure) curve with beta = 1.0.
`RDD<scala.Tuple2<Object,Object>>`	`fMeasureByThreshold(double beta)` Returns the (threshold, F-Measure) curve.
`int`	`numBins()`
`RDD<scala.Tuple2<Object,Object>>`	`pr()` Returns the precision-recall curve, which is an RDD of (recall, precision), NOT (precision, recall), with (0.0, p) prepended to it, where p is the precision associated with the lowest recall on the curve.
`RDD<scala.Tuple2<Object,Object>>`	`precisionByThreshold()` Returns the (threshold, precision) curve.
`RDD<scala.Tuple2<Object,Object>>`	`recallByThreshold()` Returns the (threshold, recall) curve.
`RDD<scala.Tuple2<Object,Object>>`	`roc()` Returns the receiver operating characteristic (ROC) curve, which is an RDD of (false positive rate, true positive rate) with (0.0, 0.0) prepended and (1.0, 1.0) appended to it.
`RDD<? extends scala.Product>`	`scoreAndLabels()`
`RDD<scala.Tuple2<Object,scala.Tuple2<Object,Object>>>`	`scoreLabelsWeight()` Deprecated. The variable `scoreLabelsWeight` should be private and will be removed in 4.0.0. Since 3.4.0.
`RDD<Object>`	`thresholds()` Returns thresholds in descending order.
`void`	`unpersist()` Unpersist intermediate RDDs used in the computation.

Methods inherited from class Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface org.apache.spark.internal.Logging
$init$, initializeForcefully, initializeLogIfNecessary, initializeLogIfNecessary, initializeLogIfNecessary$default$2, initLock, isTraceEnabled, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning, org$apache$spark$internal$Logging$$log__$eq, org$apache$spark$internal$Logging$$log_, uninitialize

- Constructor Detail
  - BinaryClassificationMetrics
```
public BinaryClassificationMetrics(RDD<? extends scala.Product> scoreAndLabels,
                                   int numBins)
```
  - BinaryClassificationMetrics
```
public BinaryClassificationMetrics(RDD<scala.Tuple2<Object,Object>> scoreAndLabels)
```
    Defaults numBins to 0.
    
    Parameters:
    
    scoreAndLabels - (undocumented)
- Method Detail
  - scoreAndLabels
```
public RDD<? extends scala.Product> scoreAndLabels()
```
  - numBins
```
public int numBins()
```
  - scoreLabelsWeight
```
public RDD<scala.Tuple2<Object,scala.Tuple2<Object,Object>>> scoreLabelsWeight()
```
    Deprecated. The variable `scoreLabelsWeight` should be private and will be removed in 4.0.0. Since 3.4.0.
  - unpersist
```
public void unpersist()
```
    Unpersist intermediate RDDs used in the computation.
  - thresholds
```
public RDD<Object> thresholds()
```
    Returns thresholds in descending order.
    
    Returns:
    
    (undocumented)
  - roc
```
public RDD<scala.Tuple2<Object,Object>> roc()
```
    Returns the receiver operating characteristic (ROC) curve, which is an RDD of (false positive rate, true positive rate) with (0.0, 0.0) prepended and (1.0, 1.0) appended to it.
    
    Returns:
    
    (undocumented)
    
    See Also:
    
    Receiver operating characteristic (Wikipedia)
  - areaUnderROC
```
public double areaUnderROC()
```
    Computes the area under the receiver operating characteristic (ROC) curve.
    
    Returns:
    
    (undocumented)
  - pr
```
public RDD<scala.Tuple2<Object,Object>> pr()
```
    Returns the precision-recall curve, which is an RDD of (recall, precision), NOT (precision, recall), with (0.0, p) prepended to it, where p is the precision associated with the lowest recall on the curve.
    
    Returns:
    
    (undocumented)
    
    See Also:
    
    Precision and recall (Wikipedia)
  - areaUnderPR
```
public double areaUnderPR()
```
    Computes the area under the precision-recall curve.
    
    Returns:
    
    (undocumented)
  - fMeasureByThreshold
```
public RDD<scala.Tuple2<Object,Object>> fMeasureByThreshold(double beta)
```
    Returns the (threshold, F-Measure) curve.
    
    Parameters:
    
    beta - the beta factor in F-Measure computation.
    
    Returns:
    
    an RDD of (threshold, F-Measure) pairs.
    
    See Also:
    
    F1 score (Wikipedia)
  - fMeasureByThreshold
```
public RDD<scala.Tuple2<Object,Object>> fMeasureByThreshold()
```
    Returns the (threshold, F-Measure) curve with beta = 1.0.
    
    Returns:
    
    (undocumented)
  - precisionByThreshold
```
public RDD<scala.Tuple2<Object,Object>> precisionByThreshold()
```
    Returns the (threshold, precision) curve.
    
    Returns:
    
    (undocumented)
  - recallByThreshold
```
public RDD<scala.Tuple2<Object,Object>> recallByThreshold()
```
    Returns the (threshold, recall) curve.
    
    Returns:
    
    (undocumented)

Class BinaryClassificationMetrics

Nested Class Summary

Nested classes/interfaces inherited from interface org.apache.spark.internal.Logging

Constructor Summary

Method Summary

Methods inherited from class Object

Methods inherited from interface org.apache.spark.internal.Logging

Constructor Detail

BinaryClassificationMetrics

BinaryClassificationMetrics

Method Detail

scoreAndLabels

numBins

scoreLabelsWeight

unpersist

thresholds

roc

areaUnderROC

pr

areaUnderPR

fMeasureByThreshold

fMeasureByThreshold

precisionByThreshold

recallByThreshold