pyspark.sql.plot.core.PySparkPlotAccessor.hist#

PySparkPlotAccessor.hist(column=None, bins=10, **kwargs)[source]#

Draw one histogram of the DataFrame’s columns.

A histogram is a representation of the distribution of data.

Parameters

column: str or list of str, optional: Column name or list of names to be used for creating the histogram plot. If None (default), all numeric columns will be used. If no numeric columns exist, behavior may depend on the plot backend.
binsinteger, default 10: Number of histogram bins to be used.
**kwargs: Additional keyword arguments.

Returns

plotly.graph_objs.Figure

Examples

>>> from pyspark.sql import SparkSession
>>> spark = SparkSession.builder.getOrCreate()
>>> data = [(5.1, 3.5, 0), (4.9, 3.0, 0), (7.0, 3.2, 1), (6.4, 3.2, 1), (5.9, 3.0, 2)]
>>> columns = ["length", "width", "species"]
>>> df = spark.createDataFrame(data, columns)
>>> df.plot.hist(bins=4)