Computes the Pearson Correlation Coefficient for two Columns.

Calculates the correlation of two columns of a SparkDataFrame. Currently only supports the Pearson Correlation Coefficient. For Spearman Correlation, consider using RDD methods found in MLlib's Statistics.


## S4 method for signature 'Column'
corr(x, col2)

corr(x, ...)

## S4 method for signature 'SparkDataFrame'
corr(x, colName1, colName2, method = "pearson")



a Column or a SparkDataFrame.


a (second) Column.


additional argument(s). If x is a Column, a Column should be provided. If x is a SparkDataFrame, two column names should be provided.


the name of the first column


the name of the second column


Optional. A character specifying the method for calculating the correlation. only "pearson" is allowed now.


The Pearson Correlation Coefficient as a Double.


corr since 1.6.0

## Not run: corr(df$c, df$d)
## Not run: 
##D df <- read.json("/path/to/file.json")
##D corr <- corr(df, "title", "gender")
##D corr <- corr(df, "title", "gender", method = "pearson")
## End(Not run)

