pyspark.pandas.groupby.GroupBy.filter¶
-
GroupBy.
filter
(func: Callable[[FrameLike], FrameLike]) → FrameLike[source]¶ Return a copy of a DataFrame excluding elements from groups that do not satisfy the boolean criterion specified by func.
- Parameters
- ffunction
Function to apply to each subframe. Should return True or False.
- dropnaDrop groups that do not pass the filter. True by default;
if False, groups that evaluate False are filled with NaNs.
- Returns
- filteredDataFrame or Series
Notes
Each subframe is endowed the attribute ‘name’ in case you need to know which group you are working on.
Examples
>>> df = ps.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar', ... 'foo', 'bar'], ... 'B' : [1, 2, 3, 4, 5, 6], ... 'C' : [2.0, 5., 8., 1., 2., 9.]}, columns=['A', 'B', 'C']) >>> grouped = df.groupby('A') >>> grouped.filter(lambda x: x['B'].mean() > 3.) A B C 1 bar 2 5.0 3 bar 4 1.0 5 bar 6 9.0
>>> df.B.groupby(df.A).filter(lambda x: x.mean() > 3.) 1 2 3 4 5 6 Name: B, dtype: int64