pyspark.sql.functions.shuffle#
- pyspark.sql.functions.shuffle(col, seed=None)[source]#
- Array function: Generates a random permutation of the given array. - New in version 2.4.0. - Changed in version 3.4.0: Supports Spark Connect. - Parameters
- Returns
- Column
- A new column that contains an array of elements in random order. 
 
 - Notes - The shuffle function is non-deterministic, meaning the order of the output array can be different for each execution. - Examples - Example 1: Shuffling a simple array - >>> import pyspark.sql.functions as sf >>> df = spark.sql("SELECT ARRAY(1, 20, 3, 5) AS data") >>> df.select("*", sf.shuffle(df.data, sf.lit(123))).show() +-------------+-------------+ | data|shuffle(data)| +-------------+-------------+ |[1, 20, 3, 5]|[5, 1, 20, 3]| +-------------+-------------+ - Example 2: Shuffling an array with null values - >>> import pyspark.sql.functions as sf >>> df = spark.sql("SELECT ARRAY(1, 20, NULL, 5) AS data") >>> df.select("*", sf.shuffle(sf.col("data"), 234)).show() +----------------+----------------+ | data| shuffle(data)| +----------------+----------------+ |[1, 20, NULL, 5]|[NULL, 5, 20, 1]| +----------------+----------------+ - Example 3: Shuffling an array with duplicate values - >>> import pyspark.sql.functions as sf >>> df = spark.sql("SELECT ARRAY(1, 2, 2, 3, 3, 3) AS data") >>> df.select("*", sf.shuffle("data", 345)).show() +------------------+------------------+ | data| shuffle(data)| +------------------+------------------+ |[1, 2, 2, 3, 3, 3]|[2, 3, 3, 1, 2, 3]| +------------------+------------------+ - Example 4: Shuffling an array with random seed - >>> import pyspark.sql.functions as sf >>> df = spark.sql("SELECT ARRAY(1, 2, 2, 3, 3, 3) AS data") >>> df.select("*", sf.shuffle("data")).show() +------------------+------------------+ | data| shuffle(data)| +------------------+------------------+ |[1, 2, 2, 3, 3, 3]|[3, 3, 2, 3, 2, 1]| +------------------+------------------+