pyspark.sql.DataFrame.subtract

DataFrame.subtract(other: pyspark.sql.dataframe.DataFrame) → pyspark.sql.dataframe.DataFrame[source]

Return a new DataFrame containing rows in this DataFrame but not in another DataFrame.

New in version 1.3.0.

Changed in version 3.4.0: Supports Spark Connect.

Parameters
otherDataFrame

Another DataFrame that needs to be subtracted.

Returns
DataFrame

Subtracted DataFrame.

Notes

This is equivalent to EXCEPT DISTINCT in SQL.

Examples

>>> df1 = spark.createDataFrame([("a", 1), ("a", 1), ("b", 3), ("c", 4)], ["C1", "C2"])
>>> df2 = spark.createDataFrame([("a", 1), ("a", 1), ("b", 3)], ["C1", "C2"])
>>> df1.subtract(df2).show()
+---+---+
| C1| C2|
+---+---+
|  c|  4|
+---+---+