pyspark.sql.GroupedData

class pyspark.sql.GroupedData(jgd: py4j.java_gateway.JavaObject, df: pyspark.sql.dataframe.DataFrame)[source]

A set of methods for aggregations on a DataFrame, created by DataFrame.groupBy().

New in version 1.3.0.

Changed in version 3.4.0: Supports Spark Connect.

Methods

agg(*exprs)

Compute aggregates and returns the result as a DataFrame.

apply(udf)

It is an alias of pyspark.sql.GroupedData.applyInPandas(); however, it takes a pyspark.sql.functions.pandas_udf() whereas pyspark.sql.GroupedData.applyInPandas() takes a Python native function.

applyInPandas(func, schema)

Maps each group of the current DataFrame using a pandas udf and returns the result as a DataFrame.

applyInPandasWithState(func, …)

Applies the given function to each group of data, while maintaining a user-defined per-group state.

avg(*cols)

Computes average values for each numeric columns for each group.

cogroup(other)

Cogroups this group with another group so that we can run cogrouped operations.

count()

Counts the number of records for each group.

max(*cols)

Computes the max value for each numeric columns for each group.

mean(*cols)

Computes average values for each numeric columns for each group.

min(*cols)

Computes the min value for each numeric column for each group.

pivot(pivot_col[, values])

Pivots a column of the current DataFrame and perform the specified aggregation.

sum(*cols)

Computes the sum for each numeric columns for each group.