corr {SparkR}R Documentation

corr

Description

Computes the Pearson Correlation Coefficient for two Columns.

Calculates the correlation of two columns of a SparkDataFrame. Currently only supports the Pearson Correlation Coefficient. For Spearman Correlation, consider using RDD methods found in MLlib's Statistics.

Usage

## S4 method for signature 'Column'
corr(x, col2)

corr(x, ...)

## S4 method for signature 'SparkDataFrame'
corr(x, colName1, colName2, method = "pearson")

Arguments

x

a Column or a SparkDataFrame.

col2

a (second) Column.

...

additional argument(s). If x is a Column, a Column should be provided. If x is a SparkDataFrame, two column names should be provided.

colName1

the name of the first column

colName2

the name of the second column

method

Optional. A character specifying the method for calculating the correlation. only "pearson" is allowed now.

Value

The Pearson Correlation Coefficient as a Double.

Note

corr since 1.6.0

corr since 1.6.0

See Also

Other math_funcs: acos, acos,Column-method; asin, asin,Column-method; atan2, atan2,Column-method; atan, atan,Column-method; bin, bin, bin,Column-method; bround, bround, bround,Column-method; cbrt, cbrt, cbrt,Column-method; ceil, ceil, ceil,Column-method, ceiling, ceiling,Column-method; conv, conv, conv,Column,numeric,numeric-method; cosh, cosh,Column-method; cos, cos,Column-method; covar_pop, covar_pop, covar_pop,characterOrColumn,characterOrColumn-method; cov, cov, cov, cov,SparkDataFrame-method, cov,characterOrColumn-method, covar_samp, covar_samp, covar_samp,characterOrColumn,characterOrColumn-method; expm1, expm1,Column-method; exp, exp,Column-method; factorial, factorial,Column-method; floor, floor,Column-method; hex, hex, hex,Column-method; hypot, hypot, hypot,Column-method; log10, log10,Column-method; log1p, log1p,Column-method; log2, log2,Column-method; log, log,Column-method; pmod, pmod, pmod,Column-method; rint, rint, rint,Column-method; round, round,Column-method; shiftLeft, shiftLeft, shiftLeft,Column,numeric-method; shiftRightUnsigned, shiftRightUnsigned, shiftRightUnsigned,Column,numeric-method; shiftRight, shiftRight, shiftRight,Column,numeric-method; sign, sign,Column-method, signum, signum, signum,Column-method; sinh, sinh,Column-method; sin, sin,Column-method; sqrt, sqrt,Column-method; tanh, tanh,Column-method; tan, tan,Column-method; toDegrees, toDegrees, toDegrees,Column-method; toRadians, toRadians, toRadians,Column-method; unhex, unhex, unhex,Column-method

Other stat functions: approxQuantile, approxQuantile,SparkDataFrame,character,numeric,numeric-method; cov, cov, cov, cov,SparkDataFrame-method, cov,characterOrColumn-method, covar_samp, covar_samp, covar_samp,characterOrColumn,characterOrColumn-method; crosstab, crosstab,SparkDataFrame,character,character-method; freqItems, freqItems,SparkDataFrame,character-method; sampleBy, sampleBy, sampleBy,SparkDataFrame,character,list,numeric-method

Examples

## Not run: corr(df$c, df$d)
## Not run: 
##D df <- read.json("/path/to/file.json")
##D corr <- corr(df, "title", "gender")
##D corr <- corr(df, "title", "gender", method = "pearson")
## End(Not run)

[Package SparkR version 2.1.0 Index]