pyspark.pandas.DataFrame.filter

DataFrame.filter(items: Optional[Sequence[Any]] = None, like: Optional[str] = None, regex: Optional[str] = None, axis: Union[int, str, None] = None) → pyspark.pandas.frame.DataFrame[source]

Subset rows or columns of dataframe according to labels in the specified index.

Note that this routine does not filter a dataframe on its contents. The filter is applied to the labels of the index.

Parameters
itemslist-like

Keep labels from axis which are in items.

likestring

Keep labels from axis for which “like in label == True”.

regexstring (regular expression)

Keep labels from axis for which re.search(regex, label) == True.

axisint or string axis name

The axis to filter on. By default this is the info axis, ‘index’ for Series, ‘columns’ for DataFrame.

Returns
same type as input object

See also

DataFrame.loc

Notes

The items, like, and regex parameters are enforced to be mutually exclusive.

axis defaults to the info axis that is used when indexing with [].

Examples

>>> df = ps.DataFrame(np.array(([1, 2, 3], [4, 5, 6])),
...                   index=['mouse', 'rabbit'],
...                   columns=['one', 'two', 'three'])
>>> # select columns by name
>>> df.filter(items=['one', 'three'])
        one  three
mouse     1      3
rabbit    4      6
>>> # select columns by regular expression
>>> df.filter(regex='e$', axis=1)
        one  three
mouse     1      3
rabbit    4      6
>>> # select rows containing 'bbi'
>>> df.filter(like='bbi', axis=0)
        one  two  three
rabbit    4    5      6

For a Series,

>>> # select rows by name
>>> df.one.filter(items=['rabbit'])
rabbit    4
Name: one, dtype: int64
>>> # select rows by regular expression
>>> df.one.filter(regex='e$')
mouse    1
Name: one, dtype: int64
>>> # select rows containing 'bbi'
>>> df.one.filter(like='bbi')
rabbit    4
Name: one, dtype: int64