pyspark.sql.plot.core.PySparkPlotAccessor.kde#
- PySparkPlotAccessor.kde(bw_method, column=None, ind=None, **kwargs)[source]#
Generate Kernel Density Estimate plot using Gaussian kernels.
In statistics, kernel density estimation (KDE) is a non-parametric way to estimate the probability density function (PDF) of a random variable. This function uses Gaussian kernels and includes automatic bandwidth determination.
- Parameters
- bw_methodint or float
The method used to calculate the estimator bandwidth. See KernelDensity in PySpark for more information.
- column: str or list of str, optional
Column name or list of names to be used for creating the kde plot. If None (default), all numeric columns will be used.
- indList of float, NumPy array or integer, optional
Evaluation points for the estimated PDF. If None (default), 1000 equally spaced points are used. If ind is a NumPy array, the KDE is evaluated at the points passed. If ind is an integer, ind number of equally spaced points are used.
- **kwargsoptional
Additional keyword arguments.
- Returns
plotly.graph_objs.Figure
Examples
>>> data = [(5.1, 3.5, 0), (4.9, 3.0, 0), (7.0, 3.2, 1), (6.4, 3.2, 1), (5.9, 3.0, 2)] >>> columns = ["length", "width", "species"] >>> df = spark.createDataFrame(data, columns) >>> df.plot.kde(bw_method=0.3, ind=100) >>> df.plot.kde(column=["length", "width"], bw_method=0.3, ind=100) >>> df.plot.kde(column="length", bw_method=0.3, ind=100)