Package org.apache.spark.rdd
Class DoubleRDDFunctions
Object
org.apache.spark.rdd.DoubleRDDFunctions
- All Implemented Interfaces:
Serializable
,org.apache.spark.internal.Logging
public class DoubleRDDFunctions
extends Object
implements org.apache.spark.internal.Logging, Serializable
Extra functions available on RDDs of Doubles through an implicit conversion.
- See Also:
-
Nested Class Summary
Nested classes/interfaces inherited from interface org.apache.spark.internal.Logging
org.apache.spark.internal.Logging.LogStringContext, org.apache.spark.internal.Logging.SparkShellLoggingFilter
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionlong[]
histogram
(double[] buckets, boolean evenBuckets) Compute a histogram using the provided buckets.scala.Tuple2<double[],
long[]> histogram
(int bucketCount) Compute a histogram of the data using bucketCount number of buckets evenly spaced between the minimum and maximum of the RDD.double
mean()
Compute the mean of this RDD's elements.meanApprox
(long timeout, double confidence) Approximate operation to return the mean within a timeout.double
popStdev()
Compute the population standard deviation of this RDD's elements.double
Compute the population variance of this RDD's elements.double
Compute the sample standard deviation of this RDD's elements (which corrects for bias in estimating the standard deviation by dividing by N-1 instead of N).double
Compute the sample variance of this RDD's elements (which corrects for bias in estimating the variance by dividing by N-1 instead of N).stats()
Return aStatCounter
object that captures the mean, variance and count of the RDD's elements in one operation.double
stdev()
Compute the population standard deviation of this RDD's elements.double
sum()
Add up the elements in this RDD.sumApprox
(long timeout, double confidence) Approximate operation to return the sum within a timeout.double
variance()
Compute the population variance of this RDD's elements.Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
Methods inherited from interface org.apache.spark.internal.Logging
initializeForcefully, initializeLogIfNecessary, initializeLogIfNecessary, initializeLogIfNecessary$default$2, isTraceEnabled, log, logDebug, logDebug, logDebug, logDebug, logError, logError, logError, logError, logInfo, logInfo, logInfo, logInfo, logName, LogStringContext, logTrace, logTrace, logTrace, logTrace, logWarning, logWarning, logWarning, logWarning, org$apache$spark$internal$Logging$$log_, org$apache$spark$internal$Logging$$log__$eq, withLogContext
-
Constructor Details
-
DoubleRDDFunctions
-
-
Method Details
-
histogram
public scala.Tuple2<double[],long[]> histogram(int bucketCount) Compute a histogram of the data using bucketCount number of buckets evenly spaced between the minimum and maximum of the RDD. For example if the min value is 0 and the max is 100 and there are two buckets the resulting buckets will be [0, 50) [50, 100]. bucketCount must be at least 1 If the RDD contains infinity, NaN throws an exception If the elements in RDD do not vary (max == min) always returns a single bucket.- Parameters:
bucketCount
- (undocumented)- Returns:
- (undocumented)
-
histogram
public long[] histogram(double[] buckets, boolean evenBuckets) Compute a histogram using the provided buckets. The buckets are all open to the right except for the last which is closed. e.g. for the array [1, 10, 20, 50] the buckets are [1, 10) [10, 20) [20, 50] e.g<=x<10, 10<=x<20, 20<=x<=50
And on the input of 1 and 50 we would have a histogram of 1, 0, 1- Parameters:
buckets
- (undocumented)evenBuckets
- (undocumented)- Returns:
- (undocumented)
- Note:
- If your histogram is evenly spaced (e.g. [0, 10, 20, 30]) this can be switched from an O(log n) insertion to O(1) per element. (where n = # buckets) if you set evenBuckets to true. buckets must be sorted and not contain any duplicates. buckets array must be at least two elements All NaN entries are treated the same. If you have a NaN bucket it must be the maximum value of the last position and all NaN entries will be counted in that bucket.
-
mean
public double mean()Compute the mean of this RDD's elements. -
meanApprox
Approximate operation to return the mean within a timeout.- Parameters:
timeout
- (undocumented)confidence
- (undocumented)- Returns:
- (undocumented)
-
popStdev
public double popStdev()Compute the population standard deviation of this RDD's elements.- Returns:
- (undocumented)
-
popVariance
public double popVariance()Compute the population variance of this RDD's elements.- Returns:
- (undocumented)
-
sampleStdev
public double sampleStdev()Compute the sample standard deviation of this RDD's elements (which corrects for bias in estimating the standard deviation by dividing by N-1 instead of N).- Returns:
- (undocumented)
-
sampleVariance
public double sampleVariance()Compute the sample variance of this RDD's elements (which corrects for bias in estimating the variance by dividing by N-1 instead of N).- Returns:
- (undocumented)
-
stats
Return aStatCounter
object that captures the mean, variance and count of the RDD's elements in one operation.- Returns:
- (undocumented)
-
stdev
public double stdev()Compute the population standard deviation of this RDD's elements. -
sum
public double sum()Add up the elements in this RDD. -
sumApprox
Approximate operation to return the sum within a timeout.- Parameters:
timeout
- (undocumented)confidence
- (undocumented)- Returns:
- (undocumented)
-
variance
public double variance()Compute the population variance of this RDD's elements.
-