pyspark.sql.functions.cume_dist

pyspark.sql.functions.cume_dist() → pyspark.sql.column.Column[source]

Window function: returns the cumulative distribution of values within a window partition, i.e. the fraction of rows that are below the current row.

New in version 1.6.0.

Changed in version 3.4.0: Supports Spark Connect.

Returns
Column

the column for calculating cumulative distribution.

Examples

>>> from pyspark.sql import Window, types
>>> df = spark.createDataFrame([1, 2, 3, 3, 4], types.IntegerType())
>>> w = Window.orderBy("value")
>>> df.withColumn("cd", cume_dist().over(w)).show()
+-----+---+
|value| cd|
+-----+---+
|    1|0.2|
|    2|0.4|
|    3|0.8|
|    3|0.8|
|    4|1.0|
+-----+---+