pyspark.sql.functions.array_sort

pyspark.sql.functions.array_sort(col: ColumnOrName, comparator: Optional[Callable[[pyspark.sql.column.Column, pyspark.sql.column.Column], pyspark.sql.column.Column]] = None) → pyspark.sql.column.Column[source]

Collection function: sorts the input array in ascending order. The elements of the input array must be orderable. Null elements will be placed at the end of the returned array.

New in version 2.4.0.

Changed in version 3.4.0: Can take a comparator function.

Changed in version 3.4.0: Supports Spark Connect.

Parameters
colColumn or str

name of column or expression

comparatorcallable, optional

A binary (Column, Column) -> Column: .... The comparator will take two arguments representing two elements of the array. It returns a negative integer, 0, or a positive integer as the first element is less than, equal to, or greater than the second element. If the comparator function returns null, the function will fail and raise an error.

Returns
Column

sorted array.

Examples

>>> df = spark.createDataFrame([([2, 1, None, 3],),([1],),([],)], ['data'])
>>> df.select(array_sort(df.data).alias('r')).collect()
[Row(r=[1, 2, 3, None]), Row(r=[1]), Row(r=[])]
>>> df = spark.createDataFrame([(["foo", "foobar", None, "bar"],),(["foo"],),([],)], ['data'])
>>> df.select(array_sort(
...     "data",
...     lambda x, y: when(x.isNull() | y.isNull(), lit(0)).otherwise(length(y) - length(x))
... ).alias("r")).collect()
[Row(r=['foobar', 'foo', None, 'bar']), Row(r=['foo']), Row(r=[])]