pyspark.sql.functions.listagg#
- pyspark.sql.functions.listagg(col, delimiter=None)[source]#
Aggregate function: returns the concatenation of non-null input values, separated by the delimiter.
New in version 4.0.0.
- Parameters
- Returns
Column
the column for computed results.
Examples
Example 1: Using listagg function
>>> from pyspark.sql import functions as sf >>> df = spark.createDataFrame([('a',), ('b',), (None,), ('c',)], ['strings']) >>> df.select(sf.listagg('strings')).show() +----------------------+ |listagg(strings, NULL)| +----------------------+ | abc| +----------------------+
Example 2: Using listagg function with a delimiter
>>> from pyspark.sql import functions as sf >>> df = spark.createDataFrame([('a',), ('b',), (None,), ('c',)], ['strings']) >>> df.select(sf.listagg('strings', ', ')).show() +--------------------+ |listagg(strings, , )| +--------------------+ | a, b, c| +--------------------+
Example 3: Using listagg function with a binary column and delimiter
>>> from pyspark.sql import functions as sf >>> df = spark.createDataFrame([(b'',), (b'',), (None,), (b'',)], ['bytes']) >>> df.select(sf.listagg('bytes', b'B')).show() +---------------------+ |listagg(bytes, X'42')| +---------------------+ | [01 42 02 42 03]| +---------------------+
Example 4: Using listagg function on a column with all None values
>>> from pyspark.sql import functions as sf >>> from pyspark.sql.types import StructType, StructField, StringType >>> schema = StructType([StructField("strings", StringType(), True)]) >>> df = spark.createDataFrame([(None,), (None,), (None,), (None,)], schema=schema) >>> df.select(sf.listagg('strings')).show() +----------------------+ |listagg(strings, NULL)| +----------------------+ | NULL| +----------------------+