pyspark.sql.functions.count_if#

pyspark.sql.functions.count_if(col)[source]#

Aggregate function: Returns the number of TRUE values for the col.

New in version 3.5.0.

Parameters
colColumn or str

target column to work on.

Returns
Column

the number of TRUE values for the col.

Examples

Example 1: Counting the number of even numbers in a numeric column

>>> from pyspark.sql import functions as sf
>>> df = spark.createDataFrame([("a", 1), ("a", 2), ("a", 3), ("b", 8), ("b", 2)], ["c1", "c2"])
>>> df.select(sf.count_if(sf.col('c2') % 2 == 0)).show()
+------------------------+
|count_if(((c2 % 2) = 0))|
+------------------------+
|                       3|
+------------------------+

Example 2: Counting the number of rows where a string column starts with a certain letter

>>> from pyspark.sql import functions as sf
>>> df = spark.createDataFrame(
...   [("apple",), ("banana",), ("cherry",), ("apple",), ("banana",)], ["fruit"])
>>> df.select(sf.count_if(sf.col('fruit').startswith('a'))).show()
+------------------------------+
|count_if(startswith(fruit, a))|
+------------------------------+
|                             2|
+------------------------------+

Example 3: Counting the number of rows where a numeric column is greater than a certain value

>>> from pyspark.sql import functions as sf
>>> df = spark.createDataFrame([(1,), (2,), (3,), (4,), (5,)], ["num"])
>>> df.select(sf.count_if(sf.col('num') > 3)).show()
+-------------------+
|count_if((num > 3))|
+-------------------+
|                  2|
+-------------------+

Example 4: Counting the number of rows where a boolean column is True

>>> from pyspark.sql import functions as sf
>>> df = spark.createDataFrame([(True,), (False,), (True,), (False,), (True,)], ["bool"])
>>> df.select(sf.count_if(sf.col('bool'))).show()
+--------------+
|count_if(bool)|
+--------------+
|             3|
+--------------+