pyspark.pandas.DataFrame.quantile

DataFrame.quantile(q: Union[float, Iterable[float]] = 0.5, axis: Union[int, str] = 0, numeric_only: bool = True, accuracy: int = 10000) → Union[DataFrame, Series][source]

Return value at the given quantile.

Note

Unlike pandas’, the quantile in pandas-on-Spark is an approximated quantile based upon approximate percentile computation because computing quantile across a large dataset is extremely expensive.

Parameters
qfloat or array-like, default 0.5 (50% quantile)

0 <= q <= 1, the quantile(s) to compute.

axisint or str, default 0 or ‘index’

Can only be set to 0 at the moment.

numeric_onlybool, default True

If False, the quantile of datetime and timedelta data will be computed as well. Can only be set to True at the moment.

accuracyint, optional

Default accuracy of approximation. Larger value means better accuracy. The relative error can be deduced by 1.0 / accuracy.

Returns
Series or DataFrame

If q is an array, a DataFrame will be returned where the index is q, the columns are the columns of self, and the values are the quantiles. If q is a float, a Series will be returned where the index is the columns of self and the values are the quantiles.

Examples

>>> psdf = ps.DataFrame({'a': [1, 2, 3, 4, 5], 'b': [6, 7, 8, 9, 0]})
>>> psdf
   a  b
0  1  6
1  2  7
2  3  8
3  4  9
4  5  0
>>> psdf.quantile(.5)
a    3.0
b    7.0
Name: 0.5, dtype: float64
>>> psdf.quantile([.25, .5, .75])
        a    b
0.25  2.0  6.0
0.50  3.0  7.0
0.75  4.0  8.0