pyspark.sql.Column.dropFields

Column.dropFields(*fieldNames)[source]

An expression that drops fields in StructType by name. This is a no-op if schema doesn’t contain field name(s).

New in version 3.1.0.

Examples

>>> from pyspark.sql import Row
>>> from pyspark.sql.functions import col, lit
>>> df = spark.createDataFrame([
...     Row(a=Row(b=1, c=2, d=3, e=Row(f=4, g=5, h=6)))])
>>> df.withColumn('a', df['a'].dropFields('b')).show()
+-----------------+
|                a|
+-----------------+
|{2, 3, {4, 5, 6}}|
+-----------------+
>>> df.withColumn('a', df['a'].dropFields('b', 'c')).show()
+--------------+
|             a|
+--------------+
|{3, {4, 5, 6}}|
+--------------+

This method supports dropping multiple nested fields directly e.g.

>>> df.withColumn("a", col("a").dropFields("e.g", "e.h")).show()
+--------------+
|             a|
+--------------+
|{1, 2, 3, {4}}|
+--------------+

However, if you are going to add/replace multiple nested fields, it is preferred to extract out the nested struct before adding/replacing multiple fields e.g.

>>> df.select(col("a").withField(
...     "e", col("a.e").dropFields("g", "h")).alias("a")
... ).show()
+--------------+
|             a|
+--------------+
|{1, 2, 3, {4}}|
+--------------+