pyspark.pandas.groupby.GroupBy.median#
- GroupBy.median(numeric_only=False, accuracy=10000)[source]#
- Compute median of groups, excluding missing values. - For multiple groupings, the result index will be a MultiIndex - Note - Unlike pandas’, the median in pandas-on-Spark is an approximated median based upon approximate percentile computation because computing median across a large dataset is extremely expensive. - Parameters
- numeric_onlybool, default False
- Include only float, int, boolean columns. - New in version 3.4.0. - Changed in version 4.0.0. 
 
- Returns
- Series or DataFrame
- Median of values within each group. 
 
 - Examples - >>> psdf = ps.DataFrame({'a': [1., 1., 1., 1., 2., 2., 2., 3., 3., 3.], ... 'b': [2., 3., 1., 4., 6., 9., 8., 10., 7., 5.], ... 'c': [3., 5., 2., 5., 1., 2., 6., 4., 3., 6.]}, ... columns=['a', 'b', 'c'], ... index=[7, 2, 4, 1, 3, 4, 9, 10, 5, 6]) >>> psdf a b c 7 1.0 2.0 3.0 2 1.0 3.0 5.0 4 1.0 1.0 2.0 1 1.0 4.0 5.0 3 2.0 6.0 1.0 4 2.0 9.0 2.0 9 2.0 8.0 6.0 10 3.0 10.0 4.0 5 3.0 7.0 3.0 6 3.0 5.0 6.0 - DataFrameGroupBy - >>> psdf.groupby('a').median().sort_index() b c a 1.0 2.0 3.0 2.0 8.0 2.0 3.0 7.0 4.0 - SeriesGroupBy - >>> psdf.groupby('a')['b'].median().sort_index() a 1.0 2.0 2.0 8.0 3.0 7.0 Name: b, dtype: float64