pyspark.pandas.groupby.GroupBy.filter#
- GroupBy.filter(func)[source]#
- Return a copy of a DataFrame excluding elements from groups that do not satisfy the boolean criterion specified by func. - Parameters
- ffunction
- Function to apply to each subframe. Should return True or False. 
- dropnaDrop groups that do not pass the filter. True by default;
- if False, groups that evaluate False are filled with NaNs. 
 
- Returns
- filteredDataFrame or Series
 
 - Notes - Each subframe is endowed the attribute ‘name’ in case you need to know which group you are working on. - Examples - >>> df = ps.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar', ... 'foo', 'bar'], ... 'B' : [1, 2, 3, 4, 5, 6], ... 'C' : [2.0, 5., 8., 1., 2., 9.]}, columns=['A', 'B', 'C']) >>> grouped = df.groupby('A') >>> grouped.filter(lambda x: x['B'].mean() > 3.) A B C 1 bar 2 5.0 3 bar 4 1.0 5 bar 6 9.0 - >>> df.B.groupby(df.A).filter(lambda x: x.mean() > 3.) 1 2 3 4 5 6 Name: B, dtype: int64