pyspark.sql.DataFrame.corr

DataFrame.corr(col1, col2, method=None)[source]

Calculates the correlation of two columns of a DataFrame as a double value. Currently only supports the Pearson Correlation Coefficient. DataFrame.corr() and DataFrameStatFunctions.corr() are aliases of each other.

New in version 1.4.0.

Parameters:
col1str

The name of the first column

col2str

The name of the second column

methodstr, optional

The correlation method. Currently only supports “pearson”