pyspark.sql.functions.nth_value

pyspark.sql.functions.nth_value(col: ColumnOrName, offset: int, ignoreNulls: Optional[bool] = False) → pyspark.sql.column.Column[source]

Window function: returns the value that is the offsetth row of the window frame (counting from 1), and null if the size of window frame is less than offset rows.

It will return the offsetth non-null value it sees when ignoreNulls is set to true. If all values are null, then null is returned.

This is equivalent to the nth_value function in SQL.

New in version 3.1.0.

Changed in version 3.4.0: Supports Spark Connect.

Parameters
colColumn or str

name of column or expression

offsetint

number of row to use as the value

ignoreNullsbool, optional

indicates the Nth value should skip null in the determination of which row to use

Returns
Column

value of nth row.

Examples

>>> from pyspark.sql import Window
>>> df = spark.createDataFrame([("a", 1),
...                             ("a", 2),
...                             ("a", 3),
...                             ("b", 8),
...                             ("b", 2)], ["c1", "c2"])
>>> df.show()
+---+---+
| c1| c2|
+---+---+
|  a|  1|
|  a|  2|
|  a|  3|
|  b|  8|
|  b|  2|
+---+---+
>>> w = Window.partitionBy("c1").orderBy("c2")
>>> df.withColumn("nth_value", nth_value("c2", 1).over(w)).show()
+---+---+---------+
| c1| c2|nth_value|
+---+---+---------+
|  a|  1|        1|
|  a|  2|        1|
|  a|  3|        1|
|  b|  2|        2|
|  b|  8|        2|
+---+---+---------+
>>> df.withColumn("nth_value", nth_value("c2", 2).over(w)).show()
+---+---+---------+
| c1| c2|nth_value|
+---+---+---------+
|  a|  1|     NULL|
|  a|  2|        2|
|  a|  3|        2|
|  b|  2|     NULL|
|  b|  8|        8|
+---+---+---------+