pyspark.sql.functions.array_except#
- pyspark.sql.functions.array_except(col1, col2)[source]#
Array function: returns a new array containing the elements present in col1 but not in col2, without duplicates.
New in version 2.4.0.
Changed in version 3.4.0: Supports Spark Connect.
- Parameters
- Returns
Column
A new array containing the elements present in col1 but not in col2.
Notes
This function does not preserve the order of the elements in the input arrays.
Examples
Example 1: Basic usage
>>> from pyspark.sql import Row, functions as sf >>> df = spark.createDataFrame([Row(c1=["b", "a", "c"], c2=["c", "d", "a", "f"])]) >>> df.select(sf.array_except(df.c1, df.c2)).show() +--------------------+ |array_except(c1, c2)| +--------------------+ | [b]| +--------------------+
Example 2: Except with no common elements
>>> from pyspark.sql import Row, functions as sf >>> df = spark.createDataFrame([Row(c1=["b", "a", "c"], c2=["d", "e", "f"])]) >>> df.select(sf.sort_array(sf.array_except(df.c1, df.c2))).show() +--------------------------------------+ |sort_array(array_except(c1, c2), true)| +--------------------------------------+ | [a, b, c]| +--------------------------------------+
Example 3: Except with all common elements
>>> from pyspark.sql import Row, functions as sf >>> df = spark.createDataFrame([Row(c1=["a", "b", "c"], c2=["a", "b", "c"])]) >>> df.select(sf.array_except(df.c1, df.c2)).show() +--------------------+ |array_except(c1, c2)| +--------------------+ | []| +--------------------+
Example 4: Except with null values
>>> from pyspark.sql import Row, functions as sf >>> df = spark.createDataFrame([Row(c1=["a", "b", None], c2=["a", None, "c"])]) >>> df.select(sf.array_except(df.c1, df.c2)).show() +--------------------+ |array_except(c1, c2)| +--------------------+ | [b]| +--------------------+
Example 5: Except with empty arrays
>>> from pyspark.sql import Row, functions as sf >>> from pyspark.sql.types import ArrayType, StringType, StructField, StructType >>> data = [Row(c1=[], c2=["a", "b", "c"])] >>> schema = StructType([ ... StructField("c1", ArrayType(StringType()), True), ... StructField("c2", ArrayType(StringType()), True) ... ]) >>> df = spark.createDataFrame(data, schema) >>> df.select(sf.array_except(df.c1, df.c2)).show() +--------------------+ |array_except(c1, c2)| +--------------------+ | []| +--------------------+