pyspark.sql.functions.max_by

pyspark.sql.functions.max_by(col: ColumnOrName, ord: ColumnOrName) → pyspark.sql.column.Column[source]

Returns the value associated with the maximum value of ord.

New in version 3.3.0.

Parameters
colColumn or str

target column that the value will be returned

ordColumn or str

column to be maximized

Returns
Column

value associated with the maximum value of ord.

Examples

>>> df = spark.createDataFrame([
...     ("Java", 2012, 20000), ("dotNET", 2012, 5000),
...     ("dotNET", 2013, 48000), ("Java", 2013, 30000)],
...     schema=("course", "year", "earnings"))
>>> df.groupby("course").agg(max_by("year", "earnings")).show()
+------+----------------------+
|course|max_by(year, earnings)|
+------+----------------------+
|  Java|                  2013|
|dotNET|                  2013|
+------+----------------------+