org.apache.spark.mllib.evaluation
Class BinaryClassificationMetrics

Object
  extended by org.apache.spark.mllib.evaluation.BinaryClassificationMetrics
All Implemented Interfaces:
Logging

public class BinaryClassificationMetrics
extends Object
implements Logging

:: Experimental :: Evaluator for binary classification.

param: scoreAndLabels an RDD of (score, label) pairs. param: numBins if greater than 0, then the curves (ROC curve, PR curve) computed internally will be down-sampled to this many "bins". If 0, no down-sampling will occur. This is useful because the curve contains a point for each distinct score in the input, and this could be as large as the input itself -- millions of points or more, when thousands may be entirely sufficient to summarize the curve. After down-sampling, the curves will instead be made of approximately numBins points instead. Points are made from bins of equal numbers of consecutive points. The size of each bin is floor(scoreAndLabels.count() / numBins), which means the resulting number of bins may not exactly equal numBins. The last bin in each partition may be smaller as a result, meaning there may be an extra sample at partition boundaries.


Constructor Summary
BinaryClassificationMetrics(RDD<scala.Tuple2<Object,Object>> scoreAndLabels)
          Defaults numBins to 0.
BinaryClassificationMetrics(RDD<scala.Tuple2<Object,Object>> scoreAndLabels, int numBins)
           
 
Method Summary
 double areaUnderPR()
          Computes the area under the precision-recall curve.
 double areaUnderROC()
          Computes the area under the receiver operating characteristic (ROC) curve.
 RDD<scala.Tuple2<Object,Object>> fMeasureByThreshold()
          Returns the (threshold, F-Measure) curve with beta = 1.0.
 RDD<scala.Tuple2<Object,Object>> fMeasureByThreshold(double beta)
          Returns the (threshold, F-Measure) curve.
 int numBins()
           
 RDD<scala.Tuple2<Object,Object>> pr()
          Returns the precision-recall curve, which is an RDD of (recall, precision), NOT (precision, recall), with (0.0, 1.0) prepended to it.
 RDD<scala.Tuple2<Object,Object>> precisionByThreshold()
          Returns the (threshold, precision) curve.
 RDD<scala.Tuple2<Object,Object>> recallByThreshold()
          Returns the (threshold, recall) curve.
 RDD<scala.Tuple2<Object,Object>> roc()
          Returns the receiver operating characteristic (ROC) curve, which is an RDD of (false positive rate, true positive rate) with (0.0, 0.0) prepended and (1.0, 1.0) appended to it.
 RDD<scala.Tuple2<Object,Object>> scoreAndLabels()
           
 RDD<Object> thresholds()
          Returns thresholds in descending order.
 void unpersist()
          Unpersist intermediate RDDs used in the computation.
 
Methods inherited from class Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface org.apache.spark.Logging
initializeIfNecessary, initializeLogging, isTraceEnabled, log_, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning
 

Constructor Detail

BinaryClassificationMetrics

public BinaryClassificationMetrics(RDD<scala.Tuple2<Object,Object>> scoreAndLabels,
                                   int numBins)

BinaryClassificationMetrics

public BinaryClassificationMetrics(RDD<scala.Tuple2<Object,Object>> scoreAndLabels)
Defaults numBins to 0.

Parameters:
scoreAndLabels - (undocumented)
Method Detail

scoreAndLabels

public RDD<scala.Tuple2<Object,Object>> scoreAndLabels()

numBins

public int numBins()

unpersist

public void unpersist()
Unpersist intermediate RDDs used in the computation.


thresholds

public RDD<Object> thresholds()
Returns thresholds in descending order.


roc

public RDD<scala.Tuple2<Object,Object>> roc()
Returns the receiver operating characteristic (ROC) curve, which is an RDD of (false positive rate, true positive rate) with (0.0, 0.0) prepended and (1.0, 1.0) appended to it.

Returns:
(undocumented)
See Also:
http://en.wikipedia.org/wiki/Receiver_operating_characteristic

areaUnderROC

public double areaUnderROC()
Computes the area under the receiver operating characteristic (ROC) curve.

Returns:
(undocumented)

pr

public RDD<scala.Tuple2<Object,Object>> pr()
Returns the precision-recall curve, which is an RDD of (recall, precision), NOT (precision, recall), with (0.0, 1.0) prepended to it.

Returns:
(undocumented)
See Also:
http://en.wikipedia.org/wiki/Precision_and_recall

areaUnderPR

public double areaUnderPR()
Computes the area under the precision-recall curve.

Returns:
(undocumented)

fMeasureByThreshold

public RDD<scala.Tuple2<Object,Object>> fMeasureByThreshold(double beta)
Returns the (threshold, F-Measure) curve.

Parameters:
beta - the beta factor in F-Measure computation.
Returns:
an RDD of (threshold, F-Measure) pairs.
See Also:
http://en.wikipedia.org/wiki/F1_score

fMeasureByThreshold

public RDD<scala.Tuple2<Object,Object>> fMeasureByThreshold()
Returns the (threshold, F-Measure) curve with beta = 1.0.


precisionByThreshold

public RDD<scala.Tuple2<Object,Object>> precisionByThreshold()
Returns the (threshold, precision) curve.


recallByThreshold

public RDD<scala.Tuple2<Object,Object>> recallByThreshold()
Returns the (threshold, recall) curve.