pyspark.pandas.DataFrame.boxplot

DataFrame.boxplot(**kwds)[source]

Make a box plot of the Series columns.

Parameters
**kwdsoptional

Additional keyword arguments are documented in pyspark.pandas.Series.plot().

precision: scalar, default = 0.01

This argument is used by pandas-on-Spark to compute approximate statistics for building a boxplot. Use smaller values to get more precise statistics (matplotlib-only).

Returns
plotly.graph_objs.Figure

Return an custom object when backend!=plotly. Return an ndarray when subplots=True (matplotlib-only).

Notes

There are behavior differences between pandas-on-Spark and pandas.

  • pandas-on-Spark computes approximate statistics - expect differences between pandas and pandas-on-Spark boxplots, especially regarding 1st and 3rd quartiles.

  • The whis argument is only supported as a single number.

  • pandas-on-Spark doesn’t support the following argument(s) (matplotlib-only).

    • bootstrap argument is not supported

    • autorange argument is not supported

Examples

Draw a box plot from a DataFrame with four columns of randomly generated data.

For Series:

>>> data = np.random.randn(25, 4)
>>> df = ps.DataFrame(data, columns=list('ABCD'))
>>> df['A'].plot.box()  

This is an unsupported function for DataFrame type