pyspark.sql.functions.first_value

pyspark.sql.functions.first_value(col: ColumnOrName, ignoreNulls: Union[bool, pyspark.sql.column.Column, None] = None) → pyspark.sql.column.Column[source]

Returns the first value of col for a group of rows. It will return the first non-null value it sees when ignoreNulls is set to true. If all values are null, then null is returned.

New in version 3.5.0.

Parameters
colColumn or str

target column to work on.

ignorenullsColumn or bool

if first value is null then look for first non-null value.

Returns
Column

some value of col for a group of rows.

Examples

>>> import pyspark.sql.functions as sf
>>> spark.createDataFrame(
...     [(None, 1), ("a", 2), ("a", 3), ("b", 8), ("b", 2)], ["a", "b"]
... ).select(sf.first_value('a'), sf.first_value('b')).show()
+--------------+--------------+
|first_value(a)|first_value(b)|
+--------------+--------------+
|          NULL|             1|
+--------------+--------------+
>>> import pyspark.sql.functions as sf
>>> spark.createDataFrame(
...     [(None, 1), ("a", 2), ("a", 3), ("b", 8), ("b", 2)], ["a", "b"]
... ).select(sf.first_value('a', True), sf.first_value('b', True)).show()
+--------------+--------------+
|first_value(a)|first_value(b)|
+--------------+--------------+
|             a|             1|
+--------------+--------------+