public class BarrierTaskContext extends TaskContext implements org.apache.spark.internal.Logging
TaskContext with extra contextual info and tooling for tasks in a barrier stage.
Use get() to obtain the barrier context for a running barrier task.| Modifier and Type | Method and Description |
|---|---|
BarrierTaskContext |
addTaskCompletionListener(TaskCompletionListener listener)
Adds a (Java friendly) listener to be executed on task completion.
|
BarrierTaskContext |
addTaskFailureListener(TaskFailureListener listener)
Adds a listener to be executed on task failure (which includes completion listener failure, if
the task body did not already fail).
|
String[] |
allGather(String message)
:: Experimental ::
Blocks until all tasks in the same stage have reached this routine.
|
int |
attemptNumber()
How many times this task has been attempted.
|
void |
barrier()
:: Experimental ::
Sets a global barrier and waits until all tasks in this stage hit this barrier.
|
int |
cpus()
CPUs allocated to the task.
|
static BarrierTaskContext |
get()
:: Experimental ::
Returns the currently active BarrierTaskContext.
|
String |
getLocalProperty(String key)
Get a local property set upstream in the driver, or null if it is missing.
|
scala.collection.Seq<Source> |
getMetricsSources(String sourceName)
::DeveloperApi::
Returns all metrics sources with the given name which are associated with the instance
which runs the task.
|
BarrierTaskInfo[] |
getTaskInfos()
:: Experimental ::
Returns
BarrierTaskInfo for all tasks in this barrier stage, ordered by partition ID. |
boolean |
isCompleted()
Returns true if the task has completed.
|
boolean |
isFailed()
Returns true if the task has failed.
|
boolean |
isInterrupted()
Returns true if the task has been killed.
|
int |
numPartitions()
Total number of partitions in the stage that this task belongs to.
|
int |
partitionId()
The ID of the RDD partition that is computed by this task.
|
scala.collection.immutable.Map<String,ResourceInformation> |
resources()
Resources allocated to the task.
|
java.util.Map<String,ResourceInformation> |
resourcesJMap()
(java-specific) Resources allocated to the task.
|
int |
stageAttemptNumber()
How many times the stage that this task belongs to has been attempted.
|
int |
stageId()
The ID of the stage that this task belong to.
|
long |
taskAttemptId()
An ID that is unique to this task attempt (within the same SparkContext, no two task attempts
will share the same attempt ID).
|
org.apache.spark.executor.TaskMetrics |
taskMetrics() |
addTaskCompletionListener, addTaskFailureListener, getPartitionIdequals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait$init$, initializeForcefully, initializeLogIfNecessary, initializeLogIfNecessary, initializeLogIfNecessary$default$2, initLock, isTraceEnabled, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning, org$apache$spark$internal$Logging$$log__$eq, org$apache$spark$internal$Logging$$log_, uninitializepublic static BarrierTaskContext get()
public void barrier()
CAUTION! In a barrier stage, each task must have the same number of barrier() calls, in all possible code branches. Otherwise, you may get the job hanging or a SparkException after timeout. Some examples of '''misuses''' are listed below: 1. Only call barrier() function on a subset of all the tasks in the same barrier stage, it shall lead to timeout of the function call.
rdd.barrier().mapPartitions { iter =>
val context = BarrierTaskContext.get()
if (context.partitionId() == 0) {
// Do nothing.
} else {
context.barrier()
}
iter
}
2. Include barrier() function in a try-catch code block, this may lead to timeout of the second function call.
rdd.barrier().mapPartitions { iter =>
val context = BarrierTaskContext.get()
try {
// Do something that might throw an Exception.
doSomething()
context.barrier()
} catch {
case e: Exception => logWarning("...", e)
}
context.barrier()
iter
}
public String[] allGather(String message)
CAUTION! The allGather method requires the same precautions as the barrier method
The message is type String rather than Array[Byte] because it is more convenient for the user at the cost of worse performance.
message - (undocumented)public BarrierTaskInfo[] getTaskInfos()
BarrierTaskInfo for all tasks in this barrier stage, ordered by partition ID.public boolean isCompleted()
TaskContextisCompleted in class TaskContextpublic boolean isFailed()
TaskContextisFailed in class TaskContextpublic boolean isInterrupted()
TaskContextisInterrupted in class TaskContextpublic BarrierTaskContext addTaskCompletionListener(TaskCompletionListener listener)
TaskContextTwo listeners registered in the same thread will be invoked in reverse order of registration if the task completes after both are registered. There are no ordering guarantees for listeners registered in different threads, or for listeners registered after the task completes. Listeners are guaranteed to execute sequentially.
An example use is for HadoopRDD to register a callback to close the input stream.
Exceptions thrown by the listener will result in failure of the task.
addTaskCompletionListener in class TaskContextlistener - (undocumented)public BarrierTaskContext addTaskFailureListener(TaskFailureListener listener)
TaskContextNote: Prior to Spark 3.4.0, failure listeners were only invoked if the main task body failed.
addTaskFailureListener in class TaskContextlistener - (undocumented)public int stageId()
TaskContextstageId in class TaskContextpublic int stageAttemptNumber()
TaskContextstageAttemptNumber in class TaskContextpublic int partitionId()
TaskContextpartitionId in class TaskContextpublic int numPartitions()
TaskContextnumPartitions in class TaskContextpublic int attemptNumber()
TaskContextattemptNumber in class TaskContextpublic long taskAttemptId()
TaskContexttaskAttemptId in class TaskContextpublic String getLocalProperty(String key)
TaskContextorg.apache.spark.SparkContext.setLocalProperty.getLocalProperty in class TaskContextkey - (undocumented)public org.apache.spark.executor.TaskMetrics taskMetrics()
taskMetrics in class TaskContextpublic scala.collection.Seq<Source> getMetricsSources(String sourceName)
TaskContextorg.apache.spark.metrics.MetricsSystem.getMetricsSources in class TaskContextsourceName - (undocumented)public int cpus()
TaskContextcpus in class TaskContextpublic scala.collection.immutable.Map<String,ResourceInformation> resources()
TaskContextResourceInformation for
specifics.resources in class TaskContextpublic java.util.Map<String,ResourceInformation> resourcesJMap()
TaskContextResourceInformation for specifics.resourcesJMap in class TaskContext