org.apache.spark.mllib.stat
Class MultivariateOnlineSummarizer

Object
  extended by org.apache.spark.mllib.stat.MultivariateOnlineSummarizer
All Implemented Interfaces:
java.io.Serializable, MultivariateStatisticalSummary

public class MultivariateOnlineSummarizer
extends Object
implements MultivariateStatisticalSummary, scala.Serializable

:: DeveloperApi :: MultivariateOnlineSummarizer implements MultivariateStatisticalSummary to compute the mean, variance, minimum, maximum, counts, and nonzero counts for samples in sparse or dense vector format in a online fashion.

Two MultivariateOnlineSummarizer can be merged together to have a statistical summary of the corresponding joint dataset.

A numerically stable algorithm is implemented to compute sample mean and variance: Reference: variance-wiki Zero elements (including explicit zero values) are skipped when calling add(), to have time complexity O(nnz) instead of O(n) for each column.

See Also:
Serialized Form

Constructor Summary
MultivariateOnlineSummarizer()
           
 
Method Summary
 MultivariateOnlineSummarizer add(Vector sample)
          Add a new sample to this summarizer, and update the statistical summary.
 long count()
          Sample size.
 Vector max()
          Maximum value of each column.
 Vector mean()
          Sample mean vector.
 MultivariateOnlineSummarizer merge(MultivariateOnlineSummarizer other)
          Merge another MultivariateOnlineSummarizer, and update the statistical summary.
 Vector min()
          Minimum value of each column.
 Vector normL1()
          L1 norm of each column
 Vector normL2()
          Euclidean magnitude of each column
 Vector numNonzeros()
          Number of nonzero elements (including explicitly presented zero values) in each column.
 Vector variance()
          Sample variance vector.
 
Methods inherited from class Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

MultivariateOnlineSummarizer

public MultivariateOnlineSummarizer()
Method Detail

add

public MultivariateOnlineSummarizer add(Vector sample)
Add a new sample to this summarizer, and update the statistical summary.

Parameters:
sample - The sample in dense/sparse vector format to be added into this summarizer.
Returns:
This MultivariateOnlineSummarizer object.

merge

public MultivariateOnlineSummarizer merge(MultivariateOnlineSummarizer other)
Merge another MultivariateOnlineSummarizer, and update the statistical summary. (Note that it's in place merging; as a result, this object will be modified.)

Parameters:
other - The other MultivariateOnlineSummarizer to be merged.
Returns:
This MultivariateOnlineSummarizer object.

mean

public Vector mean()
Description copied from interface: MultivariateStatisticalSummary
Sample mean vector.

Specified by:
mean in interface MultivariateStatisticalSummary
Returns:
(undocumented)

variance

public Vector variance()
Description copied from interface: MultivariateStatisticalSummary
Sample variance vector. Should return a zero vector if the sample size is 1.

Specified by:
variance in interface MultivariateStatisticalSummary
Returns:
(undocumented)

count

public long count()
Description copied from interface: MultivariateStatisticalSummary
Sample size.

Specified by:
count in interface MultivariateStatisticalSummary
Returns:
(undocumented)

numNonzeros

public Vector numNonzeros()
Description copied from interface: MultivariateStatisticalSummary
Number of nonzero elements (including explicitly presented zero values) in each column.

Specified by:
numNonzeros in interface MultivariateStatisticalSummary
Returns:
(undocumented)

max

public Vector max()
Description copied from interface: MultivariateStatisticalSummary
Maximum value of each column.

Specified by:
max in interface MultivariateStatisticalSummary
Returns:
(undocumented)

min

public Vector min()
Description copied from interface: MultivariateStatisticalSummary
Minimum value of each column.

Specified by:
min in interface MultivariateStatisticalSummary
Returns:
(undocumented)

normL2

public Vector normL2()
Description copied from interface: MultivariateStatisticalSummary
Euclidean magnitude of each column

Specified by:
normL2 in interface MultivariateStatisticalSummary
Returns:
(undocumented)

normL1

public Vector normL1()
Description copied from interface: MultivariateStatisticalSummary
L1 norm of each column

Specified by:
normL1 in interface MultivariateStatisticalSummary
Returns:
(undocumented)