pyspark.sql.DataFrame.unionByName¶

DataFrame.unionByName(other: pyspark.sql.dataframe.DataFrame, allowMissingColumns: bool = False) → pyspark.sql.dataframe.DataFrame[source]¶

Returns a new DataFrame containing union of rows in this and another DataFrame.

This method performs a union operation on both input DataFrames, resolving columns by name (rather than position). When allowMissingColumns is True, missing columns will be filled with null.

New in version 2.3.0.

Changed in version 3.4.0: Supports Spark Connect.

Parameters

otherDataFrame: Another DataFrame that needs to be combined.
allowMissingColumnsbool, optional, default False: Specify whether to allow missing columns.

New in version 3.1.0.

Returns

DataFrame: A new DataFrame containing the combined rows with corresponding columns of the two given DataFrames.

Examples

Example 1: Union of two DataFrames with same columns in different order.

>>> df1 = spark.createDataFrame([[1, 2, 3]], ["col0", "col1", "col2"])
>>> df2 = spark.createDataFrame([[4, 5, 6]], ["col1", "col2", "col0"])
>>> df1.unionByName(df2).show()
+----+----+----+
|col0|col1|col2|
+----+----+----+
|   1|   2|   3|
|   6|   4|   5|
+----+----+----+

Example 2: Union with missing columns and setting allowMissingColumns=True.

>>> df1 = spark.createDataFrame([[1, 2, 3]], ["col0", "col1", "col2"])
>>> df2 = spark.createDataFrame([[4, 5, 6]], ["col1", "col2", "col3"])
>>> df1.unionByName(df2, allowMissingColumns=True).show()
+----+----+----+----+
|col0|col1|col2|col3|
+----+----+----+----+
|   1|   2|   3|NULL|
|NULL|   4|   5|   6|
+----+----+----+----+

Example 3: Union of two DataFrames with few common columns.

>>> df1 = spark.createDataFrame([[1, 2, 3]], ["col0", "col1", "col2"])
>>> df2 = spark.createDataFrame([[4, 5, 6, 7]], ["col1", "col2", "col3", "col4"])
>>> df1.unionByName(df2, allowMissingColumns=True).show()
+----+----+----+----+----+
|col0|col1|col2|col3|col4|
+----+----+----+----+----+
|   1|   2|   3|NULL|NULL|
|NULL|   4|   5|   6|   7|
+----+----+----+----+----+

Example 4: Union of two DataFrames with completely different columns.

>>> df1 = spark.createDataFrame([[0, 1, 2]], ["col0", "col1", "col2"])
>>> df2 = spark.createDataFrame([[3, 4, 5]], ["col3", "col4", "col5"])
>>> df1.unionByName(df2, allowMissingColumns=True).show()
+----+----+----+----+----+----+
|col0|col1|col2|col3|col4|col5|
+----+----+----+----+----+----+
|   0|   1|   2|NULL|NULL|NULL|
|NULL|NULL|NULL|   3|   4|   5|
+----+----+----+----+----+----+

pyspark.sql.DataFrame.unionAll

pyspark.sql.DataFrame.unpersist