Package org.apache.spark.rdd
Class DoubleRDDFunctions
Object
org.apache.spark.rdd.DoubleRDDFunctions
- All Implemented Interfaces:
- Serializable,- org.apache.spark.internal.Logging
public class DoubleRDDFunctions
extends Object
implements org.apache.spark.internal.Logging, Serializable
Extra functions available on RDDs of Doubles through an implicit conversion.
- See Also:
- 
Nested Class SummaryNested classes/interfaces inherited from interface org.apache.spark.internal.Loggingorg.apache.spark.internal.Logging.LogStringContext, org.apache.spark.internal.Logging.SparkShellLoggingFilter
- 
Constructor SummaryConstructors
- 
Method SummaryModifier and TypeMethodDescriptionlong[]histogram(double[] buckets, boolean evenBuckets) Compute a histogram using the provided buckets.scala.Tuple2<double[],long[]> histogram(int bucketCount) Compute a histogram of the data using bucketCount number of buckets evenly spaced between the minimum and maximum of the RDD.doublemean()Compute the mean of this RDD's elements.meanApprox(long timeout, double confidence) Approximate operation to return the mean within a timeout.doublepopStdev()Compute the population standard deviation of this RDD's elements.doubleCompute the population variance of this RDD's elements.doubleCompute the sample standard deviation of this RDD's elements (which corrects for bias in estimating the standard deviation by dividing by N-1 instead of N).doubleCompute the sample variance of this RDD's elements (which corrects for bias in estimating the variance by dividing by N-1 instead of N).stats()Return aStatCounterobject that captures the mean, variance and count of the RDD's elements in one operation.doublestdev()Compute the population standard deviation of this RDD's elements.doublesum()Add up the elements in this RDD.sumApprox(long timeout, double confidence) Approximate operation to return the sum within a timeout.doublevariance()Compute the population variance of this RDD's elements.Methods inherited from class java.lang.Objectequals, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitMethods inherited from interface org.apache.spark.internal.LogginginitializeForcefully, initializeLogIfNecessary, initializeLogIfNecessary, initializeLogIfNecessary$default$2, isTraceEnabled, log, logBasedOnLevel, logDebug, logDebug, logDebug, logDebug, logError, logError, logError, logError, logInfo, logInfo, logInfo, logInfo, logName, LogStringContext, logTrace, logTrace, logTrace, logTrace, logWarning, logWarning, logWarning, logWarning, MDC, org$apache$spark$internal$Logging$$log_, org$apache$spark$internal$Logging$$log__$eq, withLogContext
- 
Constructor Details- 
DoubleRDDFunctions
 
- 
- 
Method Details- 
histogrampublic scala.Tuple2<double[],long[]> histogram(int bucketCount) Compute a histogram of the data using bucketCount number of buckets evenly spaced between the minimum and maximum of the RDD. For example if the min value is 0 and the max is 100 and there are two buckets the resulting buckets will be [0, 50) [50, 100]. bucketCount must be at least 1 If the RDD contains infinity, NaN throws an exception If the elements in RDD do not vary (max == min) always returns a single bucket.- Parameters:
- bucketCount- (undocumented)
- Returns:
- (undocumented)
 
- 
histogrampublic long[] histogram(double[] buckets, boolean evenBuckets) Compute a histogram using the provided buckets. The buckets are all open to the right except for the last which is closed. e.g. for the array [1, 10, 20, 50] the buckets are [1, 10) [10, 20) [20, 50] e.g<=x<10, 10<=x<20, 20<=x<=50And on the input of 1 and 50 we would have a histogram of 1, 0, 1- Parameters:
- buckets- (undocumented)
- evenBuckets- (undocumented)
- Returns:
- (undocumented)
- Note:
- If your histogram is evenly spaced (e.g. [0, 10, 20, 30]) this can be switched from an O(log n) insertion to O(1) per element. (where n = # buckets) if you set evenBuckets to true. buckets must be sorted and not contain any duplicates. buckets array must be at least two elements All NaN entries are treated the same. If you have a NaN bucket it must be the maximum value of the last position and all NaN entries will be counted in that bucket.
 
- 
meanpublic double mean()Compute the mean of this RDD's elements.
- 
meanApproxApproximate operation to return the mean within a timeout.- Parameters:
- timeout- (undocumented)
- confidence- (undocumented)
- Returns:
- (undocumented)
 
- 
popStdevpublic double popStdev()Compute the population standard deviation of this RDD's elements.- Returns:
- (undocumented)
 
- 
popVariancepublic double popVariance()Compute the population variance of this RDD's elements.- Returns:
- (undocumented)
 
- 
sampleStdevpublic double sampleStdev()Compute the sample standard deviation of this RDD's elements (which corrects for bias in estimating the standard deviation by dividing by N-1 instead of N).- Returns:
- (undocumented)
 
- 
sampleVariancepublic double sampleVariance()Compute the sample variance of this RDD's elements (which corrects for bias in estimating the variance by dividing by N-1 instead of N).- Returns:
- (undocumented)
 
- 
statsReturn aStatCounterobject that captures the mean, variance and count of the RDD's elements in one operation.- Returns:
- (undocumented)
 
- 
stdevpublic double stdev()Compute the population standard deviation of this RDD's elements.
- 
sumpublic double sum()Add up the elements in this RDD.
- 
sumApproxApproximate operation to return the sum within a timeout.- Parameters:
- timeout- (undocumented)
- confidence- (undocumented)
- Returns:
- (undocumented)
 
- 
variancepublic double variance()Compute the population variance of this RDD's elements.
 
-