pyspark.pandas.DataFrame.mode

DataFrame.mode(axis: Union[int, str] = 0, numeric_only: bool = False, dropna: bool = True) → pyspark.pandas.frame.DataFrame[source]

Get the mode(s) of each element along the selected axis.

The mode of a set of values is the value that appears most often. It can be multiple values.

New in version 3.4.0.

Parameters
axis{0 or ‘index’}, default 0

Axis for the function to be applied on.

numeric_onlybool, default False

If True, only apply to numeric columns.

dropnabool, default True

Don’t consider counts of NaN/NaT.

Returns
DataFrame

The modes of each column or row.

See also

Series.mode

Return the highest frequency value in a Series.

Series.value_counts

Return the counts of values in a Series.

Examples

>>> df = ps.DataFrame([('bird', 2, 2),
...                    ('mammal', 4, np.nan),
...                    ('arthropod', 8, 0),
...                    ('bird', 2, np.nan)],
...                   index=('falcon', 'horse', 'spider', 'ostrich'),
...                   columns=('species', 'legs', 'wings'))
>>> df
           species  legs  wings
falcon        bird     2    2.0
horse       mammal     4    NaN
spider   arthropod     8    0.0
ostrich       bird     2    NaN

By default missing values are not considered, and the mode of wings are both 0 and 2. Because the resulting DataFrame has two rows, the second row of species and legs contains NaN.

>>> df.mode()
  species  legs  wings
0    bird   2.0    0.0
1    None   NaN    2.0

Setting dropna=False NaN values are considered and they can be the mode (like for wings).

>>> df.mode(dropna=False)
  species  legs  wings
0    bird     2    NaN

Setting numeric_only=True, only the mode of numeric columns is computed, and columns of other types are ignored.

>>> df.mode(numeric_only=True)
   legs  wings
0   2.0    0.0
1   NaN    2.0