pyspark.sql.GroupedData.avg#
- GroupedData.avg(*cols)[source]#
Computes average values for each numeric columns for each group.
New in version 1.3.0.
Changed in version 3.4.0: Supports Spark Connect.
- Parameters
- colsstr
column names. Non-numeric columns are ignored.
Examples
>>> df = spark.createDataFrame([ ... (2, "Alice", 80), (3, "Alice", 100), ... (5, "Bob", 120), (10, "Bob", 140)], ["age", "name", "height"]) >>> df.show() +---+-----+------+ |age| name|height| +---+-----+------+ | 2|Alice| 80| | 3|Alice| 100| | 5| Bob| 120| | 10| Bob| 140| +---+-----+------+
Group-by name, and calculate the mean of the age in each group.
>>> df.groupBy("name").avg('age').sort("name").show() +-----+--------+ | name|avg(age)| +-----+--------+ |Alice| 2.5| | Bob| 7.5| +-----+--------+
Calculate the mean of the age and height in all data.
>>> df.groupBy().avg('age', 'height').show() +--------+-----------+ |avg(age)|avg(height)| +--------+-----------+ | 5.0| 110.0| +--------+-----------+