pyspark.sql.functions.bucket

pyspark.sql.functions.bucket(numBuckets: Union[pyspark.sql.column.Column, int], col: ColumnOrName) → pyspark.sql.column.Column[source]

Partition transform function: A transform for any type that partitions by a hash of the input column.

New in version 3.1.0.

Changed in version 3.4.0: Supports Spark Connect.

Parameters
colColumn or str

target date or timestamp column to work on.

Returns
Column

data partitioned by given columns.

Notes

This function can be used only in combination with partitionedBy() method of the DataFrameWriterV2.

Examples

>>> df.writeTo("catalog.db.table").partitionedBy(  
...     bucket(42, "ts")
... ).createOrReplace()