org.apache.spark.mllib.stat.correlation

Class PearsonCorrelation

• Object
• org.apache.spark.mllib.stat.correlation.PearsonCorrelation

• ```public class PearsonCorrelation
extends Object```
Compute Pearson correlation for two RDDs of the type RDD[Double] or the correlation matrix for an RDD of the type RDD[Vector].

Definition of Pearson correlation can be found at http://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficient

`PearsonCorrelation()`
`static double` ```computeCorrelation(RDD<Object> x, RDD<Object> y)```
Compute the Pearson correlation for two datasets.
`static Matrix` `computeCorrelationMatrix(RDD<Vector> X)`
Compute the Pearson correlation matrix S, for the input matrix, where S(i, j) is the correlation between column i and j.
`static Matrix` `computeCorrelationMatrixFromCovariance(Matrix covarianceMatrix)`
Compute the Pearson correlation matrix from the covariance matrix.
`static double` ```computeCorrelationWithMatrixImpl(RDD<Object> x, RDD<Object> y)```
• PearsonCorrelation

`public PearsonCorrelation()`
• computeCorrelation

```public static double computeCorrelation(RDD<Object> x,
RDD<Object> y)```
Compute the Pearson correlation for two datasets. NaN if either vector has 0 variance.
Parameters:
Returns:
• computeCorrelationMatrix

`public static Matrix computeCorrelationMatrix(RDD<Vector> X)`
Compute the Pearson correlation matrix S, for the input matrix, where S(i, j) is the correlation between column i and j. 0 covariance results in a correlation value of Double.NaN.
Parameters:
Returns:
• computeCorrelationMatrixFromCovariance

`public static Matrix computeCorrelationMatrixFromCovariance(Matrix covarianceMatrix)`
Compute the Pearson correlation matrix from the covariance matrix. 0 variance results in a correlation value of Double.NaN.
Parameters:
Returns:
• computeCorrelationWithMatrixImpl

```public static double computeCorrelationWithMatrixImpl(RDD<Object> x,
RDD<Object> y)```