pyspark.sql.DataFrame.drop

DataFrame.drop(*cols: ColumnOrName) → DataFrame[source]

Returns a new DataFrame without specified columns. This is a no-op if the schema doesn’t contain the given column name(s).

New in version 1.4.0.

Changed in version 3.4.0: Supports Spark Connect.

Parameters
cols: str orclass:Column

a name of the column, or the Column to drop

Returns
DataFrame

DataFrame without given columns.

Notes

When an input is a column name, it is treated literally without further interpretation. Otherwise, will try to match the equivalent expression. So that dropping column by its name drop(colName) has different semantic with directly dropping the column drop(col(colName)).

Examples

>>> from pyspark.sql import Row
>>> from pyspark.sql.functions import col, lit
>>> df = spark.createDataFrame(
...     [(14, "Tom"), (23, "Alice"), (16, "Bob")], ["age", "name"])
>>> df2 = spark.createDataFrame([Row(height=80, name="Tom"), Row(height=85, name="Bob")])
>>> df.drop('age').show()
+-----+
| name|
+-----+
|  Tom|
|Alice|
|  Bob|
+-----+
>>> df.drop(df.age).show()
+-----+
| name|
+-----+
|  Tom|
|Alice|
|  Bob|
+-----+

Drop the column that joined both DataFrames on.

>>> df.join(df2, df.name == df2.name, 'inner').drop('name').sort('age').show()
+---+------+
|age|height|
+---+------+
| 14|    80|
| 16|    85|
+---+------+
>>> df3 = df.join(df2)
>>> df3.show()
+---+-----+------+----+
|age| name|height|name|
+---+-----+------+----+
| 14|  Tom|    80| Tom|
| 14|  Tom|    85| Bob|
| 23|Alice|    80| Tom|
| 23|Alice|    85| Bob|
| 16|  Bob|    80| Tom|
| 16|  Bob|    85| Bob|
+---+-----+------+----+

Drop two column by the same name.

>>> df3.drop("name").show()
+---+------+
|age|height|
+---+------+
| 14|    80|
| 14|    85|
| 23|    80|
| 23|    85|
| 16|    80|
| 16|    85|
+---+------+

Can not drop col(‘name’) due to ambiguous reference.

>>> df3.drop(col("name")).show()
Traceback (most recent call last):
...
pyspark.errors.exceptions.captured.AnalysisException: [AMBIGUOUS_REFERENCE] Reference...
>>> df4 = df.withColumn("a.b.c", lit(1))
>>> df4.show()
+---+-----+-----+
|age| name|a.b.c|
+---+-----+-----+
| 14|  Tom|    1|
| 23|Alice|    1|
| 16|  Bob|    1|
+---+-----+-----+
>>> df4.drop("a.b.c").show()
+---+-----+
|age| name|
+---+-----+
| 14|  Tom|
| 23|Alice|
| 16|  Bob|
+---+-----+

Can not find a column matching the expression “a.b.c”.

>>> df4.drop(col("a.b.c")).show()
+---+-----+-----+
|age| name|a.b.c|
+---+-----+-----+
| 14|  Tom|    1|
| 23|Alice|    1|
| 16|  Bob|    1|
+---+-----+-----+