pyspark.sql.DataFrame.intersect¶
- 
DataFrame.intersect(other: pyspark.sql.dataframe.DataFrame) → pyspark.sql.dataframe.DataFrame[source]¶
- Return a new - DataFramecontaining rows only in both this- DataFrameand another- DataFrame. Note that any duplicates are removed. To preserve duplicates use- intersectAll().- New in version 1.3.0. - Changed in version 3.4.0: Supports Spark Connect. - Parameters
- Returns
- DataFrame
- Combined DataFrame. 
 
 - Notes - This is equivalent to INTERSECT in SQL. - Examples - >>> df1 = spark.createDataFrame([("a", 1), ("a", 1), ("b", 3), ("c", 4)], ["C1", "C2"]) >>> df2 = spark.createDataFrame([("a", 1), ("a", 1), ("b", 3)], ["C1", "C2"]) >>> df1.intersect(df2).sort(df1.C1.desc()).show() +---+---+ | C1| C2| +---+---+ | b| 3| | a| 1| +---+---+