pyspark.sql.DataFrame.unpivot

DataFrame.unpivot(ids: Union[ColumnOrName, List[ColumnOrName], Tuple[ColumnOrName, …]], values: Union[ColumnOrName, List[ColumnOrName], Tuple[ColumnOrName, …], None], variableColumnName: str, valueColumnName: str) → DataFrame[source]

Unpivot a DataFrame from wide format to long format, optionally leaving identifier columns set. This is the reverse to groupBy(…).pivot(…).agg(…), except for the aggregation, which cannot be reversed.

This function is useful to massage a DataFrame into a format where some columns are identifier columns (“ids”), while all other columns (“values”) are “unpivoted” to the rows, leaving just two non-id columns, named as given by variableColumnName and valueColumnName.

When no “id” columns are given, the unpivoted DataFrame consists of only the “variable” and “value” columns.

The values columns must not be empty so at least one value must be given to be unpivoted. When values is None, all non-id columns will be unpivoted.

All “value” columns must share a least common data type. Unless they are the same data type, all “value” columns are cast to the nearest common data type. For instance, types IntegerType and LongType are cast to LongType, while IntegerType and StringType do not have a common data type and unpivot fails.

New in version 3.4.0.

Parameters
idsstr, Column, tuple, list

Column(s) to use as identifiers. Can be a single column or column name, or a list or tuple for multiple columns.

valuesstr, Column, tuple, list, optional

Column(s) to unpivot. Can be a single column or column name, or a list or tuple for multiple columns. If specified, must not be empty. If not specified, uses all columns that are not set as ids.

variableColumnNamestr

Name of the variable column.

valueColumnNamestr

Name of the value column.

Returns
DataFrame

Unpivoted DataFrame.

See also

DataFrame.melt

Notes

Supports Spark Connect.

Examples

>>> df = spark.createDataFrame(
...     [(1, 11, 1.1), (2, 12, 1.2)],
...     ["id", "int", "double"],
... )
>>> df.show()
+---+---+------+
| id|int|double|
+---+---+------+
|  1| 11|   1.1|
|  2| 12|   1.2|
+---+---+------+
>>> df.unpivot("id", ["int", "double"], "var", "val").show()
+---+------+----+
| id|   var| val|
+---+------+----+
|  1|   int|11.0|
|  1|double| 1.1|
|  2|   int|12.0|
|  2|double| 1.2|
+---+------+----+