Class BinaryClassificationMetrics
- All Implemented Interfaces:
- org.apache.spark.internal.Logging
 param:  scoreAndLabels an RDD of (score, label) or (score, label, weight) tuples.
 param:  numBins if greater than 0, then the curves (ROC curve, PR curve) computed internally
                will be down-sampled to this many "bins". If 0, no down-sampling will occur.
                This is useful because the curve contains a point for each distinct score
                in the input, and this could be as large as the input itself -- millions of
                points or more, when thousands may be entirely sufficient to summarize
                the curve. After down-sampling, the curves will instead be made of approximately
                numBins points instead. Points are made from bins of equal numbers of
                consecutive points. The size of each bin is
                floor(scoreAndLabels.count() / numBins), which means the resulting number
                of bins may not exactly equal numBins. The last bin in each partition may
                be smaller as a result, meaning there may be an extra sample at
                partition boundaries.
- 
Nested Class SummaryNested classes/interfaces inherited from interface org.apache.spark.internal.Loggingorg.apache.spark.internal.Logging.LogStringContext, org.apache.spark.internal.Logging.SparkShellLoggingFilter
- 
Constructor SummaryConstructorsConstructorDescriptionBinaryClassificationMetrics(RDD<? extends scala.Product> scoreAndLabels, int numBins) BinaryClassificationMetrics(RDD<scala.Tuple2<Object, Object>> scoreAndLabels) DefaultsnumBinsto 0.
- 
Method SummaryModifier and TypeMethodDescriptiondoubleComputes the area under the precision-recall curve.doubleComputes the area under the receiver operating characteristic (ROC) curve.Returns the (threshold, F-Measure) curve with beta = 1.0.fMeasureByThreshold(double beta) Returns the (threshold, F-Measure) curve.intnumBins()pr()Returns the precision-recall curve, which is an RDD of (recall, precision), NOT (precision, recall), with (0.0, p) prepended to it, where p is the precision associated with the lowest recall on the curve.Returns the (threshold, precision) curve.Returns the (threshold, recall) curve.roc()Returns the receiver operating characteristic (ROC) curve, which is an RDD of (false positive rate, true positive rate) with (0.0, 0.0) prepended and (1.0, 1.0) appended to it.RDD<? extends scala.Product>Returns thresholds in descending order.voidUnpersist intermediate RDDs used in the computation.Methods inherited from class java.lang.Objectequals, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitMethods inherited from interface org.apache.spark.internal.LogginginitializeForcefully, initializeLogIfNecessary, initializeLogIfNecessary, initializeLogIfNecessary$default$2, isTraceEnabled, log, logDebug, logDebug, logDebug, logDebug, logError, logError, logError, logError, logInfo, logInfo, logInfo, logInfo, logName, LogStringContext, logTrace, logTrace, logTrace, logTrace, logWarning, logWarning, logWarning, logWarning, org$apache$spark$internal$Logging$$log_, org$apache$spark$internal$Logging$$log__$eq, withLogContext
- 
Constructor Details- 
BinaryClassificationMetrics
- 
BinaryClassificationMetricsDefaultsnumBinsto 0.- Parameters:
- scoreAndLabels- (undocumented)
 
 
- 
- 
Method Details- 
scoreAndLabels
- 
numBinspublic int numBins()
- 
unpersistpublic void unpersist()Unpersist intermediate RDDs used in the computation.
- 
thresholdsReturns thresholds in descending order.- Returns:
- (undocumented)
 
- 
rocReturns the receiver operating characteristic (ROC) curve, which is an RDD of (false positive rate, true positive rate) with (0.0, 0.0) prepended and (1.0, 1.0) appended to it.- Returns:
- (undocumented)
- See Also:
 
- 
areaUnderROCpublic double areaUnderROC()Computes the area under the receiver operating characteristic (ROC) curve.- Returns:
- (undocumented)
 
- 
prReturns the precision-recall curve, which is an RDD of (recall, precision), NOT (precision, recall), with (0.0, p) prepended to it, where p is the precision associated with the lowest recall on the curve.- Returns:
- (undocumented)
- See Also:
 
- 
areaUnderPRpublic double areaUnderPR()Computes the area under the precision-recall curve.- Returns:
- (undocumented)
 
- 
fMeasureByThresholdReturns the (threshold, F-Measure) curve.- Parameters:
- beta- the beta factor in F-Measure computation.
- Returns:
- an RDD of (threshold, F-Measure) pairs.
- See Also:
 
- 
fMeasureByThresholdReturns the (threshold, F-Measure) curve with beta = 1.0.- Returns:
- (undocumented)
 
- 
precisionByThresholdReturns the (threshold, precision) curve.- Returns:
- (undocumented)
 
- 
recallByThresholdReturns the (threshold, recall) curve.- Returns:
- (undocumented)
 
 
-