pyspark.sql.functions.last¶
-
pyspark.sql.functions.
last
(col: ColumnOrName, ignorenulls: bool = False) → pyspark.sql.column.Column[source]¶ Aggregate function: returns the last value in a group.
The function by default returns the last values it sees. It will return the last non-null value it sees when ignoreNulls is set to true. If all values are null, then null is returned.
New in version 1.3.0.
Changed in version 3.4.0: Supports Spark Connect.
- Parameters
- Returns
Column
last value of the group.
Notes
The function is non-deterministic because its results depends on the order of the rows which may be non-deterministic after a shuffle.
Examples
>>> df = spark.createDataFrame([("Alice", 2), ("Bob", 5), ("Alice", None)], ("name", "age")) >>> df = df.orderBy(df.age.desc()) >>> df.groupby("name").agg(last("age")).orderBy("name").show() +-----+---------+ | name|last(age)| +-----+---------+ |Alice| NULL| | Bob| 5| +-----+---------+
Now, to ignore any nulls we needs to set
ignorenulls
to True>>> df.groupby("name").agg(last("age", ignorenulls=True)).orderBy("name").show() +-----+---------+ | name|last(age)| +-----+---------+ |Alice| 2| | Bob| 5| +-----+---------+