pyspark.sql.functions.regr_slope

pyspark.sql.functions.regr_slope(y: ColumnOrName, x: ColumnOrName) → pyspark.sql.column.Column[source]

Aggregate function: returns the slope of the linear regression line for non-null pairs in a group, where y is the dependent variable and x is the independent variable.

New in version 3.5.0.

Parameters
yColumn or str

the dependent variable.

xColumn or str

the independent variable.

Returns
Column

the slope of the linear regression line for non-null pairs in a group.

Examples

>>> x = (col("id") % 3).alias("x")
>>> y = (randn(42) + x * 10).alias("y")
>>> df = spark.range(0, 1000, 1, 1).select(x, y)
>>> df.select(regr_slope("y", "x")).first()
Row(regr_slope(y, x)=10.040390844891048)