pyspark.sql.functions.filter¶
-
pyspark.sql.functions.
filter
(col: ColumnOrName, f: Union[Callable[[pyspark.sql.column.Column], pyspark.sql.column.Column], Callable[[pyspark.sql.column.Column, pyspark.sql.column.Column], pyspark.sql.column.Column]]) → pyspark.sql.column.Column[source]¶ Returns an array of elements for which a predicate holds in a given array.
New in version 3.1.0.
Changed in version 3.4.0: Supports Spark Connect.
- Parameters
- col
Column
or str name of column or expression
- ffunction
A function that returns the Boolean expression. Can take one of the following forms:
Unary
(x: Column) -> Column: ...
- Binary
(x: Column, i: Column) -> Column...
, where the second argument is a 0-based index of the element.
- Binary
and can use methods of
Column
, functions defined inpyspark.sql.functions
and ScalaUserDefinedFunctions
. PythonUserDefinedFunctions
are not supported (SPARK-27052).
- col
- Returns
Column
filtered array of elements where given function evaluated to True when passed as an argument.
Examples
>>> df = spark.createDataFrame( ... [(1, ["2018-09-20", "2019-02-03", "2019-07-01", "2020-06-01"])], ... ("key", "values") ... ) >>> def after_second_quarter(x): ... return month(to_date(x)) > 6 ... >>> df.select( ... filter("values", after_second_quarter).alias("after_second_quarter") ... ).show(truncate=False) +------------------------+ |after_second_quarter | +------------------------+ |[2018-09-20, 2019-07-01]| +------------------------+