pyspark.BarrierTaskContext#

class pyspark.BarrierTaskContext[source]#

A TaskContext with extra contextual info and tooling for tasks in a barrier stage. Use BarrierTaskContext.get() to obtain the barrier context for a running barrier task.

New in version 2.4.0.

Notes

This API is experimental

Examples

Set a barrier, and execute it with RDD.

>>> from pyspark import BarrierTaskContext
>>> def block_and_do_something(itr):
...     taskcontext = BarrierTaskContext.get()
...     # Do something.
...
...     # Wait until all tasks finished.
...     taskcontext.barrier()
...
...     return itr
...
>>> rdd = spark.sparkContext.parallelize([1])
>>> rdd.barrier().mapPartitions(block_and_do_something).collect()
[1]

Methods

`allGather`([message])	This function blocks until all tasks in the same stage have reached this routine.
`attemptNumber`()	How many times this task has been attempted.
`barrier`()	Sets a global barrier and waits until all tasks in this stage hit this barrier.
`cpus`()	CPUs allocated to the task.
`get`()	Return the currently active `BarrierTaskContext`.
`getLocalProperty`(key)	Get a local property set upstream in the driver, or None if it is missing.
`getTaskInfos`()	Returns `BarrierTaskInfo` for all tasks in this barrier stage, ordered by partition ID.
`partitionId`()	The ID of the RDD partition that is computed by this task.
`resources`()	Resources allocated to the task.
`stageId`()	The ID of the stage that this task belong to.
`taskAttemptId`()	An ID that is unique to this task attempt (within the same `SparkContext`, no two task attempts will share the same attempt ID).