pyspark.sql.functions.listagg#

pyspark.sql.functions.listagg(col, delimiter=None)[source]#

Aggregate function: returns the concatenation of non-null input values, separated by the delimiter.

New in version 4.0.0.

Parameters

colColumn or column name: target column to compute on.
delimiterColumn, literal string or bytes, optional: the delimiter to separate the values. The default value is None.

Returns

Column: the column for computed results.

Examples

Example 1: Using listagg function

>>> from pyspark.sql import functions as sf
>>> df = spark.createDataFrame([('a',), ('b',), (None,), ('c',)], ['strings'])
>>> df.select(sf.listagg('strings')).show()
+----------------------+
|listagg(strings, NULL)|
+----------------------+
|                   abc|
+----------------------+

Example 2: Using listagg function with a delimiter

>>> from pyspark.sql import functions as sf
>>> df = spark.createDataFrame([('a',), ('b',), (None,), ('c',)], ['strings'])
>>> df.select(sf.listagg('strings', ', ')).show()
+--------------------+
|listagg(strings, , )|
+--------------------+
|             a, b, c|
+--------------------+

Example 3: Using listagg function with a binary column and delimiter

>>> from pyspark.sql import functions as sf
>>> df = spark.createDataFrame([(b'',), (b'',), (None,), (b'',)], ['bytes'])
>>> df.select(sf.listagg('bytes', b'B')).show()
+---------------------+
|listagg(bytes, X'42')|
+---------------------+
|     [01 42 02 42 03]|
+---------------------+

Example 4: Using listagg function on a column with all None values

>>> from pyspark.sql import functions as sf
>>> from pyspark.sql.types import StructType, StructField, StringType
>>> schema = StructType([StructField("strings", StringType(), True)])
>>> df = spark.createDataFrame([(None,), (None,), (None,), (None,)], schema=schema)
>>> df.select(sf.listagg('strings')).show()
+----------------------+
|listagg(strings, NULL)|
+----------------------+
|                  NULL|
+----------------------+