pyspark.sql.functions.spark_partition_id

pyspark.sql.functions.spark_partition_id() → pyspark.sql.column.Column[source]

A column for partition ID.

New in version 1.6.0.

Changed in version 3.4.0: Supports Spark Connect.

Returns
Column

partition id the record belongs to.

Notes

This is non deterministic because it depends on data partitioning and task scheduling.

Examples

>>> df = spark.range(2)
>>> df.repartition(1).select(spark_partition_id().alias("pid")).collect()
[Row(pid=0), Row(pid=0)]