New in version 2.4.0.
This API is experimental
Set a barrier, and execute it with RDD.
>>> from pyspark import BarrierTaskContext >>> def block_and_do_something(itr): ... taskcontext = BarrierTaskContext.get() ... # Do something. ... ... # Wait until all tasks finished. ... taskcontext.barrier() ... ... return itr ... >>> rdd = spark.sparkContext.parallelize() >>> rdd.barrier().mapPartitions(block_and_do_something).collect() 
This function blocks until all tasks in the same stage have reached this routine.
How many times this task has been attempted.
Sets a global barrier and waits until all tasks in this stage hit this barrier.
CPUs allocated to the task.
Return the currently active
Get a local property set upstream in the driver, or None if it is missing.
BarrierTaskInfofor all tasks in this barrier stage, ordered by partition ID.
The ID of the RDD partition that is computed by this task.
Resources allocated to the task.
The ID of the stage that this task belong to.
An ID that is unique to this task attempt (within the same
SparkContext, no two task attempts will share the same attempt ID).