pyspark.pandas.DataFrame.count

DataFrame.count(axis: Union[int, str, None] = None, numeric_only: bool = False) → Union[int, float, bool, str, bytes, decimal.Decimal, datetime.date, datetime.datetime, None, Series]

Count non-NA cells for each column.

The values None, NaN are considered NA.

Parameters
axis: {0 or ‘index’, 1 or ‘columns’}, default 0

If 0 or ‘index’ counts are generated for each column. If 1 or ‘columns’ counts are generated for each row.

numeric_only: bool, default False

If True, include only float, int, boolean columns. This parameter is mainly for pandas compatibility.

Returns
max: scalar for a Series, and a Series for a DataFrame.

See also

DataFrame.shape

Number of DataFrame rows and columns (including NA elements).

DataFrame.isna

Boolean same-sized DataFrame showing places of NA elements.

Examples

Constructing DataFrame from a dictionary:

>>> df = ps.DataFrame({"Person":
...                    ["John", "Myla", "Lewis", "John", "Myla"],
...                    "Age": [24., np.nan, 21., 33, 26],
...                    "Single": [False, True, True, True, False]},
...                   columns=["Person", "Age", "Single"])
>>> df
  Person   Age  Single
0   John  24.0   False
1   Myla   NaN    True
2  Lewis  21.0    True
3   John  33.0    True
4   Myla  26.0   False

Notice the uncounted NA values:

>>> df.count()
Person    5
Age       4
Single    5
dtype: int64
>>> df.count(axis=1)
0    3
1    2
2    3
3    3
4    3
dtype: int64

On a Series:

>>> df['Person'].count()
5
>>> df['Age'].count()
4