Package org.apache.spark.mllib.stat
Class MultivariateOnlineSummarizer
Object
org.apache.spark.mllib.stat.MultivariateOnlineSummarizer
- All Implemented Interfaces:
Serializable
,MultivariateStatisticalSummary
,scala.Serializable
public class MultivariateOnlineSummarizer
extends Object
implements MultivariateStatisticalSummary, scala.Serializable
MultivariateOnlineSummarizer implements
MultivariateStatisticalSummary
to compute the mean,
variance, minimum, maximum, counts, and nonzero counts for instances in sparse or dense vector
format in an online fashion.
Two MultivariateOnlineSummarizer can be merged together to have a statistical summary of the corresponding joint dataset.
A numerically stable algorithm is implemented to compute the mean and variance of instances: Reference: variance-wiki Zero elements (including explicit zero values) are skipped when calling add(), to have time complexity O(nnz) instead of O(n) for each column.
For weighted instances, the unbiased estimation of variance is defined by the reliability weights: see Reliability weights (Wikipedia).
- See Also:
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionAdd a new sample to this summarizer, and update the statistical summary.long
count()
Sample size.max()
Maximum value of each dimension.mean()
Sample mean of each dimension.Merge another MultivariateOnlineSummarizer, and update the statistical summary.min()
Minimum value of each dimension.normL1()
L1 norm of each dimension.normL2()
L2 (Euclidean) norm of each dimension.Number of nonzero elements in each dimension.variance()
Unbiased estimate of sample variance of each dimension.double
Sum of weights.
-
Constructor Details
-
MultivariateOnlineSummarizer
public MultivariateOnlineSummarizer()
-
-
Method Details
-
add
Add a new sample to this summarizer, and update the statistical summary.- Parameters:
sample
- The sample in dense/sparse vector format to be added into this summarizer.- Returns:
- This MultivariateOnlineSummarizer object.
-
count
public long count()Sample size.- Specified by:
count
in interfaceMultivariateStatisticalSummary
- Returns:
- (undocumented)
-
max
Maximum value of each dimension.- Specified by:
max
in interfaceMultivariateStatisticalSummary
- Returns:
- (undocumented)
-
mean
Sample mean of each dimension.- Specified by:
mean
in interfaceMultivariateStatisticalSummary
- Returns:
- (undocumented)
-
merge
Merge another MultivariateOnlineSummarizer, and update the statistical summary. (Note that it's in place merging; as a result,this
object will be modified.)- Parameters:
other
- The other MultivariateOnlineSummarizer to be merged.- Returns:
- This MultivariateOnlineSummarizer object.
-
min
Minimum value of each dimension.- Specified by:
min
in interfaceMultivariateStatisticalSummary
- Returns:
- (undocumented)
-
normL1
L1 norm of each dimension.- Specified by:
normL1
in interfaceMultivariateStatisticalSummary
- Returns:
- (undocumented)
-
normL2
L2 (Euclidean) norm of each dimension.- Specified by:
normL2
in interfaceMultivariateStatisticalSummary
- Returns:
- (undocumented)
-
numNonzeros
Number of nonzero elements in each dimension.- Specified by:
numNonzeros
in interfaceMultivariateStatisticalSummary
- Returns:
- (undocumented)
-
variance
Unbiased estimate of sample variance of each dimension.- Specified by:
variance
in interfaceMultivariateStatisticalSummary
- Returns:
- (undocumented)
-
weightSum
public double weightSum()Sum of weights.- Specified by:
weightSum
in interfaceMultivariateStatisticalSummary
- Returns:
- (undocumented)
-