class BinaryClassificationMetrics extends Logging
Evaluator for binary classification.
- Annotations
- @Since( "1.0.0" )
- Source
- BinaryClassificationMetrics.scala
- Alphabetic
- By Inheritance
- BinaryClassificationMetrics
- Logging
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Instance Constructors
-
new
BinaryClassificationMetrics(scoreAndLabels: RDD[(Double, Double)])
Defaults
numBins
to 0.Defaults
numBins
to 0.- Annotations
- @Since( "1.0.0" )
-
new
BinaryClassificationMetrics(scoreAndLabels: RDD[_ <: Product], numBins: Int = 1000)
- scoreAndLabels
an RDD of (score, label) or (score, label, weight) tuples.
- numBins
if greater than 0, then the curves (ROC curve, PR curve) computed internally will be down-sampled to this many "bins". If 0, no down-sampling will occur. This is useful because the curve contains a point for each distinct score in the input, and this could be as large as the input itself -- millions of points or more, when thousands may be entirely sufficient to summarize the curve. After down-sampling, the curves will instead be made of approximately
numBins
points instead. Points are made from bins of equal numbers of consecutive points. The size of each bin isfloor(scoreAndLabels.count() / numBins)
, which means the resulting number of bins may not exactly equal numBins. The last bin in each partition may be smaller as a result, meaning there may be an extra sample at partition boundaries.
- Annotations
- @Since( "3.0.0" )
Value Members
-
def
areaUnderPR(): Double
Computes the area under the precision-recall curve.
Computes the area under the precision-recall curve.
- Annotations
- @Since( "1.0.0" )
-
def
areaUnderROC(): Double
Computes the area under the receiver operating characteristic (ROC) curve.
Computes the area under the receiver operating characteristic (ROC) curve.
- Annotations
- @Since( "1.0.0" )
-
def
fMeasureByThreshold(): RDD[(Double, Double)]
Returns the (threshold, F-Measure) curve with beta = 1.0.
Returns the (threshold, F-Measure) curve with beta = 1.0.
- Annotations
- @Since( "1.0.0" )
-
def
fMeasureByThreshold(beta: Double): RDD[(Double, Double)]
Returns the (threshold, F-Measure) curve.
Returns the (threshold, F-Measure) curve.
- beta
the beta factor in F-Measure computation.
- returns
an RDD of (threshold, F-Measure) pairs.
- Annotations
- @Since( "1.0.0" )
- See also
-
val
numBins: Int
- Annotations
- @Since( "1.3.0" )
-
def
pr(): RDD[(Double, Double)]
Returns the precision-recall curve, which is an RDD of (recall, precision), NOT (precision, recall), with (0.0, p) prepended to it, where p is the precision associated with the lowest recall on the curve.
Returns the precision-recall curve, which is an RDD of (recall, precision), NOT (precision, recall), with (0.0, p) prepended to it, where p is the precision associated with the lowest recall on the curve.
- Annotations
- @Since( "1.0.0" )
- See also
-
def
precisionByThreshold(): RDD[(Double, Double)]
Returns the (threshold, precision) curve.
Returns the (threshold, precision) curve.
- Annotations
- @Since( "1.0.0" )
-
def
recallByThreshold(): RDD[(Double, Double)]
Returns the (threshold, recall) curve.
Returns the (threshold, recall) curve.
- Annotations
- @Since( "1.0.0" )
-
def
roc(): RDD[(Double, Double)]
Returns the receiver operating characteristic (ROC) curve, which is an RDD of (false positive rate, true positive rate) with (0.0, 0.0) prepended and (1.0, 1.0) appended to it.
Returns the receiver operating characteristic (ROC) curve, which is an RDD of (false positive rate, true positive rate) with (0.0, 0.0) prepended and (1.0, 1.0) appended to it.
- Annotations
- @Since( "1.0.0" )
- See also
-
val
scoreAndLabels: RDD[_ <: Product]
- Annotations
- @Since( "1.3.0" )
-
def
thresholds(): RDD[Double]
Returns thresholds in descending order.
Returns thresholds in descending order.
- Annotations
- @Since( "1.0.0" )
-
def
unpersist(): Unit
Unpersist intermediate RDDs used in the computation.
Unpersist intermediate RDDs used in the computation.
- Annotations
- @Since( "1.0.0" )