pyspark.sql.functions.width_bucket

pyspark.sql.functions.width_bucket(v: ColumnOrName, min: ColumnOrName, max: ColumnOrName, numBucket: Union[ColumnOrName, int]) → pyspark.sql.column.Column[source]

Returns the bucket number into which the value of this expression would fall after being evaluated. Note that input arguments must follow conditions listed below; otherwise, the method will return null.

New in version 3.5.0.

Parameters
vstr or Column

value to compute a bucket number in the histogram

minstr or Column

minimum value of the histogram

maxstr or Column

maximum value of the histogram

numBucketstr, Column or int

the number of buckets

Returns
Column

the bucket number into which the value would fall after being evaluated

Examples

>>> df = spark.createDataFrame([
...     (5.3, 0.2, 10.6, 5),
...     (-2.1, 1.3, 3.4, 3),
...     (8.1, 0.0, 5.7, 4),
...     (-0.9, 5.2, 0.5, 2)],
...     ['v', 'min', 'max', 'n'])
>>> df.select(width_bucket('v', 'min', 'max', 'n')).show()
+----------------------------+
|width_bucket(v, min, max, n)|
+----------------------------+
|                           3|
|                           0|
|                           5|
|                           3|
+----------------------------+