pyspark.sql.DataFrame.drop

DataFrame.drop(*cols: ColumnOrName) → DataFrame[source]

Returns a new DataFrame without specified columns. This is a no-op if the schema doesn’t contain the given column name(s).

New in version 1.4.0.

Changed in version 3.4.0: Supports Spark Connect.

Parameters
cols: str or :class:`Column`

a name of the column, or the Column to drop

Returns
DataFrame

DataFrame without given columns.

Examples

>>> from pyspark.sql import Row
>>> df = spark.createDataFrame(
...     [(14, "Tom"), (23, "Alice"), (16, "Bob")], ["age", "name"])
>>> df2 = spark.createDataFrame([Row(height=80, name="Tom"), Row(height=85, name="Bob")])
>>> df.drop('age').show()
+-----+
| name|
+-----+
|  Tom|
|Alice|
|  Bob|
+-----+
>>> df.drop(df.age).show()
+-----+
| name|
+-----+
|  Tom|
|Alice|
|  Bob|
+-----+

Drop the column that joined both DataFrames on.

>>> df.join(df2, df.name == df2.name, 'inner').drop('name').sort('age').show()
+---+------+
|age|height|
+---+------+
| 14|    80|
| 16|    85|
+---+------+