pyspark.sql.functions.listagg_distinct#
- pyspark.sql.functions.listagg_distinct(col, delimiter=None)[source]#
Aggregate function: returns the concatenation of distinct non-null input values, separated by the delimiter.
New in version 4.0.0.
- Parameters
- Returns
Column
the column for computed results.
Examples
Example 1: Using listagg_distinct function
>>> from pyspark.sql import functions as sf >>> df = spark.createDataFrame([('a',), ('b',), (None,), ('c',), ('b',)], ['strings']) >>> df.select(sf.listagg_distinct('strings')).show() +-------------------------------+ |listagg(DISTINCT strings, NULL)| +-------------------------------+ | abc| +-------------------------------+
Example 2: Using listagg_distinct function with a delimiter
>>> from pyspark.sql import functions as sf >>> df = spark.createDataFrame([('a',), ('b',), (None,), ('c',), ('b',)], ['strings']) >>> df.select(sf.listagg_distinct('strings', ', ')).show() +-----------------------------+ |listagg(DISTINCT strings, , )| +-----------------------------+ | a, b, c| +-----------------------------+
Example 3: Using listagg_distinct function with a binary column and delimiter
>>> from pyspark.sql import functions as sf >>> df = spark.createDataFrame([(b'',), (b'',), (None,), (b'',), (b'',)], ... ['bytes']) >>> df.select(sf.listagg_distinct('bytes', b'B')).show() +------------------------------+ |listagg(DISTINCT bytes, X'42')| +------------------------------+ | [01 42 02 42 03]| +------------------------------+
Example 4: Using listagg_distinct function on a column with all None values
>>> from pyspark.sql import functions as sf >>> from pyspark.sql.types import StructType, StructField, StringType >>> schema = StructType([StructField("strings", StringType(), True)]) >>> df = spark.createDataFrame([(None,), (None,), (None,), (None,)], schema=schema) >>> df.select(sf.listagg_distinct('strings')).show() +-------------------------------+ |listagg(DISTINCT strings, NULL)| +-------------------------------+ | NULL| +-------------------------------+