pyspark.sql.functions.array#
- pyspark.sql.functions.array(*cols)[source]#
Collection function: Creates a new array column from the input columns or column names.
New in version 1.4.0.
Changed in version 3.4.0: Supports Spark Connect.
- Parameters
- Returns
Column
A new Column of array type, where each value is an array containing the corresponding values from the input columns.
Examples
Example 1: Basic usage of array function with column names.
>>> from pyspark.sql import functions as sf >>> df = spark.createDataFrame([("Alice", "doctor"), ("Bob", "engineer")], ... ("name", "occupation")) >>> df.select(sf.array('name', 'occupation')).show() +-----------------------+ |array(name, occupation)| +-----------------------+ | [Alice, doctor]| | [Bob, engineer]| +-----------------------+
Example 2: Usage of array function with Column objects.
>>> from pyspark.sql import functions as sf >>> df = spark.createDataFrame([("Alice", "doctor"), ("Bob", "engineer")], ... ("name", "occupation")) >>> df.select(sf.array(df.name, df.occupation)).show() +-----------------------+ |array(name, occupation)| +-----------------------+ | [Alice, doctor]| | [Bob, engineer]| +-----------------------+
Example 3: Single argument as list of column names.
>>> from pyspark.sql import functions as sf >>> df = spark.createDataFrame([("Alice", "doctor"), ("Bob", "engineer")], ... ("name", "occupation")) >>> df.select(sf.array(['name', 'occupation'])).show() +-----------------------+ |array(name, occupation)| +-----------------------+ | [Alice, doctor]| | [Bob, engineer]| +-----------------------+
Example 4: Usage of array function with columns of different types.
>>> from pyspark.sql import functions as sf >>> df = spark.createDataFrame( ... [("Alice", 2, 22.2), ("Bob", 5, 36.1)], ... ("name", "age", "weight")) >>> df.select(sf.array(['age', 'weight'])).show() +------------------+ |array(age, weight)| +------------------+ | [2.0, 22.2]| | [5.0, 36.1]| +------------------+
Example 5: array function with a column containing null values.
>>> from pyspark.sql import functions as sf >>> df = spark.createDataFrame([("Alice", None), ("Bob", "engineer")], ... ("name", "occupation")) >>> df.select(sf.array('name', 'occupation')).show() +-----------------------+ |array(name, occupation)| +-----------------------+ | [Alice, NULL]| | [Bob, engineer]| +-----------------------+