Package org.apache.spark.mllib.stat
Class MultivariateOnlineSummarizer
Object
org.apache.spark.mllib.stat.MultivariateOnlineSummarizer
- All Implemented Interfaces:
- Serializable,- MultivariateStatisticalSummary
public class MultivariateOnlineSummarizer
extends Object
implements MultivariateStatisticalSummary, Serializable
MultivariateOnlineSummarizer implements 
MultivariateStatisticalSummary to compute the mean,
 variance, minimum, maximum, counts, and nonzero counts for instances in sparse or dense vector
 format in an online fashion.
 Two MultivariateOnlineSummarizer can be merged together to have a statistical summary of the corresponding joint dataset.
A numerically stable algorithm is implemented to compute the mean and variance of instances: Reference: variance-wiki Zero elements (including explicit zero values) are skipped when calling add(), to have time complexity O(nnz) instead of O(n) for each column.
For weighted instances, the unbiased estimation of variance is defined by the reliability weights: see Reliability weights (Wikipedia).
- See Also:
- 
Constructor SummaryConstructors
- 
Method SummaryModifier and TypeMethodDescriptionAdd a new sample to this summarizer, and update the statistical summary.longcount()Sample size.max()Maximum value of each dimension.mean()Sample mean of each dimension.Merge another MultivariateOnlineSummarizer, and update the statistical summary.min()Minimum value of each dimension.normL1()L1 norm of each dimension.normL2()L2 (Euclidean) norm of each dimension.Number of nonzero elements in each dimension.variance()Unbiased estimate of sample variance of each dimension.doubleSum of weights.
- 
Constructor Details- 
MultivariateOnlineSummarizerpublic MultivariateOnlineSummarizer()
 
- 
- 
Method Details- 
addAdd a new sample to this summarizer, and update the statistical summary.- Parameters:
- sample- The sample in dense/sparse vector format to be added into this summarizer.
- Returns:
- This MultivariateOnlineSummarizer object.
 
- 
countpublic long count()Sample size.- Specified by:
- countin interface- MultivariateStatisticalSummary
- Returns:
- (undocumented)
 
- 
maxMaximum value of each dimension.- Specified by:
- maxin interface- MultivariateStatisticalSummary
- Returns:
- (undocumented)
 
- 
meanSample mean of each dimension.- Specified by:
- meanin interface- MultivariateStatisticalSummary
- Returns:
- (undocumented)
 
- 
mergeMerge another MultivariateOnlineSummarizer, and update the statistical summary. (Note that it's in place merging; as a result,thisobject will be modified.)- Parameters:
- other- The other MultivariateOnlineSummarizer to be merged.
- Returns:
- This MultivariateOnlineSummarizer object.
 
- 
minMinimum value of each dimension.- Specified by:
- minin interface- MultivariateStatisticalSummary
- Returns:
- (undocumented)
 
- 
normL1L1 norm of each dimension.- Specified by:
- normL1in interface- MultivariateStatisticalSummary
- Returns:
- (undocumented)
 
- 
normL2L2 (Euclidean) norm of each dimension.- Specified by:
- normL2in interface- MultivariateStatisticalSummary
- Returns:
- (undocumented)
 
- 
numNonzerosNumber of nonzero elements in each dimension.- Specified by:
- numNonzerosin interface- MultivariateStatisticalSummary
- Returns:
- (undocumented)
 
- 
varianceUnbiased estimate of sample variance of each dimension.- Specified by:
- variancein interface- MultivariateStatisticalSummary
- Returns:
- (undocumented)
 
- 
weightSumpublic double weightSum()Sum of weights.- Specified by:
- weightSumin interface- MultivariateStatisticalSummary
- Returns:
- (undocumented)
 
 
-