pyspark.sql.functions.corr#
- pyspark.sql.functions.corr(col1, col2)[source]#
- Returns a new - Columnfor the Pearson Correlation Coefficient for- col1and- col2.- New in version 1.6.0. - Changed in version 3.4.0: Supports Spark Connect. - Parameters
- Returns
- Column
- Pearson Correlation Coefficient of these two column values. 
 
 - Examples - >>> from pyspark.sql import functions as sf >>> a = range(20) >>> b = [2 * x for x in range(20)] >>> df = spark.createDataFrame(zip(a, b), ["a", "b"]) >>> df.agg(sf.corr("a", df.b)).show() +----------+ |corr(a, b)| +----------+ | 1.0| +----------+