pyspark.pandas.DataFrame.filter¶
-
DataFrame.
filter
(items: Optional[Sequence[Any]] = None, like: Optional[str] = None, regex: Optional[str] = None, axis: Union[int, str, None] = None) → pyspark.pandas.frame.DataFrame[source]¶ Subset rows or columns of dataframe according to labels in the specified index.
Note that this routine does not filter a dataframe on its contents. The filter is applied to the labels of the index.
- Parameters
- itemslist-like
Keep labels from axis which are in items.
- likestring
Keep labels from axis for which “like in label == True”.
- regexstring (regular expression)
Keep labels from axis for which re.search(regex, label) == True.
- axisint or string axis name
The axis to filter on. By default this is the info axis, ‘index’ for Series, ‘columns’ for DataFrame.
- Returns
- same type as input object
See also
Notes
The
items
,like
, andregex
parameters are enforced to be mutually exclusive.axis
defaults to the info axis that is used when indexing with[]
.Examples
>>> df = ps.DataFrame(np.array(([1, 2, 3], [4, 5, 6])), ... index=['mouse', 'rabbit'], ... columns=['one', 'two', 'three'])
>>> # select columns by name >>> df.filter(items=['one', 'three']) one three mouse 1 3 rabbit 4 6
>>> # select columns by regular expression >>> df.filter(regex='e$', axis=1) one three mouse 1 3 rabbit 4 6
>>> # select rows containing 'bbi' >>> df.filter(like='bbi', axis=0) one two three rabbit 4 5 6
For a Series,
>>> # select rows by name >>> df.one.filter(items=['rabbit']) rabbit 4 Name: one, dtype: int64
>>> # select rows by regular expression >>> df.one.filter(regex='e$') mouse 1 Name: one, dtype: int64
>>> # select rows containing 'bbi' >>> df.one.filter(like='bbi') rabbit 4 Name: one, dtype: int64