pyspark.pandas.DataFrame.corrwith

DataFrame.corrwith(other: Union[DataFrame, Series], axis: Union[int, str] = 0, drop: bool = False, method: str = 'pearson') → Series[source]

Compute pairwise correlation.

Pairwise correlation is computed between rows or columns of DataFrame with rows or columns of Series or DataFrame. DataFrames are first aligned along both axes before computing the correlations.

New in version 3.4.0.

Parameters
otherDataFrame, Series

Object with which to compute correlations.

axisint, default 0 or ‘index’

Can only be set to 0 now.

dropbool, default False

Drop missing indices from result.

method{‘pearson’, ‘spearman’, ‘kendall’}
  • pearson : standard correlation coefficient

  • spearman : Spearman rank correlation

  • kendall : Kendall Tau correlation coefficient

Returns
Series

Pairwise correlations.

See also

DataFrame.corr

Compute pairwise correlation of columns.

Examples

>>> df1 = ps.DataFrame({
...         "A":[1, 5, 7, 8],
...         "X":[5, 8, 4, 3],
...         "C":[10, 4, 9, 3]})
>>> df1.corrwith(df1[["X", "C"]]).sort_index()
A    NaN
C    1.0
X    1.0
dtype: float64
>>> df2 = ps.DataFrame({
...         "A":[5, 3, 6, 4],
...         "B":[11, 2, 4, 3],
...         "C":[4, 3, 8, 5]})
>>> with ps.option_context("compute.ops_on_diff_frames", True):
...     df1.corrwith(df2).sort_index()
A   -0.041703
B         NaN
C    0.395437
X         NaN
dtype: float64
>>> with ps.option_context("compute.ops_on_diff_frames", True):
...     df1.corrwith(df2, method="kendall").sort_index()
A    0.0
B    NaN
C    0.0
X    NaN
dtype: float64
>>> with ps.option_context("compute.ops_on_diff_frames", True):
...     df1.corrwith(df2.B, method="spearman").sort_index()
A   -0.4
C    0.8
X   -0.2
dtype: float64
>>> with ps.option_context("compute.ops_on_diff_frames", True):
...     df2.corrwith(df1.X).sort_index()
A   -0.597614
B   -0.151186
C   -0.642857
dtype: float64