pyspark.sql.GroupedData.count

GroupedData.count() → pyspark.sql.dataframe.DataFrame[source]

Counts the number of records for each group.

New in version 1.3.0.

Changed in version 3.4.0: Supports Spark Connect.

Examples

>>> df = spark.createDataFrame(
...      [(2, "Alice"), (3, "Alice"), (5, "Bob"), (10, "Bob")], ["age", "name"])
>>> df.show()
+---+-----+
|age| name|
+---+-----+
|  2|Alice|
|  3|Alice|
|  5|  Bob|
| 10|  Bob|
+---+-----+

Group-by name, and count each group.

>>> df.groupBy(df.name).count().sort("name").show()
+-----+-----+
| name|count|
+-----+-----+
|Alice|    2|
|  Bob|    2|
+-----+-----+