pyspark.sql.DataFrame.filter#
- DataFrame.filter(condition)[source]#
- Filters rows using the given condition. - where()is an alias for- filter().- New in version 1.3.0. - Changed in version 3.4.0: Supports Spark Connect. - Parameters
- conditionColumnor str
- A - Columnof- types.BooleanTypeor a string of SQL expressions.
 
- condition
- Returns
- DataFrame
- A new DataFrame with rows that satisfy the condition. 
 
 - Examples - >>> df = spark.createDataFrame([ ... (2, "Alice", "Math"), (5, "Bob", "Physics"), (7, "Charlie", "Chemistry")], ... schema=["age", "name", "subject"]) - Filter by - Columninstances.- >>> df.filter(df.age > 3).show() +---+-------+---------+ |age| name| subject| +---+-------+---------+ | 5| Bob| Physics| | 7|Charlie|Chemistry| +---+-------+---------+ >>> df.where(df.age == 2).show() +---+-----+-------+ |age| name|subject| +---+-----+-------+ | 2|Alice| Math| +---+-----+-------+ - Filter by SQL expression in a string. - >>> df.filter("age > 3").show() +---+-------+---------+ |age| name| subject| +---+-------+---------+ | 5| Bob| Physics| | 7|Charlie|Chemistry| +---+-------+---------+ >>> df.where("age = 2").show() +---+-----+-------+ |age| name|subject| +---+-----+-------+ | 2|Alice| Math| +---+-----+-------+ - Filter by multiple conditions. - >>> df.filter((df.age > 3) & (df.subject == "Physics")).show() +---+----+-------+ |age|name|subject| +---+----+-------+ | 5| Bob|Physics| +---+----+-------+ >>> df.filter((df.age == 2) | (df.subject == "Chemistry")).show() +---+-------+---------+ |age| name| subject| +---+-------+---------+ | 2| Alice| Math| | 7|Charlie|Chemistry| +---+-------+---------+ - Filter by multiple conditions using SQL expression. - >>> df.filter("age > 3 AND name = 'Bob'").show() +---+----+-------+ |age|name|subject| +---+----+-------+ | 5| Bob|Physics| +---+----+-------+ - Filter using the - Column.isin()function.- >>> df.filter(df.name.isin("Alice", "Bob")).show() +---+-----+-------+ |age| name|subject| +---+-----+-------+ | 2|Alice| Math| | 5| Bob|Physics| +---+-----+-------+ - Filter by a list of values using the - Column.isin()function.- >>> df.filter(df.subject.isin(["Math", "Physics"])).show() +---+-----+-------+ |age| name|subject| +---+-----+-------+ | 2|Alice| Math| | 5| Bob|Physics| +---+-----+-------+ - Filter using the ~ operator to exclude certain values. - >>> df.filter(~df.name.isin(["Alice", "Charlie"])).show() +---+----+-------+ |age|name|subject| +---+----+-------+ | 5| Bob|Physics| +---+----+-------+ - Filter using the - Column.isNotNull()function.- >>> df.filter(df.name.isNotNull()).show() +---+-------+---------+ |age| name| subject| +---+-------+---------+ | 2| Alice| Math| | 5| Bob| Physics| | 7|Charlie|Chemistry| +---+-------+---------+ - Filter using the - Column.like()function.- >>> df.filter(df.name.like("Al%")).show() +---+-----+-------+ |age| name|subject| +---+-----+-------+ | 2|Alice| Math| +---+-----+-------+ - Filter using the - Column.contains()function.- >>> df.filter(df.name.contains("i")).show() +---+-------+---------+ |age| name| subject| +---+-------+---------+ | 2| Alice| Math| | 7|Charlie|Chemistry| +---+-------+---------+ - Filter using the - Column.between()function.- >>> df.filter(df.age.between(2, 5)).show() +---+-----+-------+ |age| name|subject| +---+-----+-------+ | 2|Alice| Math| | 5| Bob|Physics| +---+-----+-------+