pyspark.sql.DataFrame.filter¶
-
DataFrame.
filter
(condition: ColumnOrName) → DataFrame[source]¶ Filters rows using the given condition.
where()
is an alias forfilter()
.New in version 1.3.0.
Changed in version 3.4.0: Supports Spark Connect.
- Parameters
- condition
Column
or str a
Column
oftypes.BooleanType
or a string of SQL expressions.
- condition
- Returns
DataFrame
Filtered DataFrame.
Examples
>>> df = spark.createDataFrame([ ... (2, "Alice"), (5, "Bob")], schema=["age", "name"])
Filter by
Column
instances.>>> df.filter(df.age > 3).show() +---+----+ |age|name| +---+----+ | 5| Bob| +---+----+ >>> df.where(df.age == 2).show() +---+-----+ |age| name| +---+-----+ | 2|Alice| +---+-----+
Filter by SQL expression in a string.
>>> df.filter("age > 3").show() +---+----+ |age|name| +---+----+ | 5| Bob| +---+----+ >>> df.where("age = 2").show() +---+-----+ |age| name| +---+-----+ | 2|Alice| +---+-----+