pyspark.pandas.groupby.GroupBy.sum#
- GroupBy.sum(numeric_only=False, min_count=0)[source]#
- Compute sum of group values - New in version 3.3.0. - Parameters
- numeric_onlybool, default False
- Include only float, int, boolean columns. - New in version 3.4.0. - Changed in version 4.0.0. 
- min_countint, default 0
- The required number of valid values to perform the operation. If fewer than min_count non-NA values are present the result will be NA. - New in version 3.4.0. 
 
 - Notes - There is a behavior difference between pandas-on-Spark and pandas: - when there is a non-numeric aggregation column, it will be ignored
- even if numeric_only is False. 
 
 - Examples - >>> df = ps.DataFrame({"A": [1, 2, 1, 2], "B": [True, False, False, True], ... "C": [3, 4, 3, 4], "D": ["a", "a", "b", "a"]}) - >>> df.groupby("A").sum().sort_index() B C D A 1 1 6 ab 2 1 8 aa - >>> df.groupby("D").sum().sort_index() A B C D a 5 2 11 b 1 0 3 - >>> df.groupby("D").sum(min_count=3).sort_index() A B C D a 5.0 2.0 11.0 b NaN NaN NaN