pyspark.TaskContext#
- class pyspark.TaskContext[source]#
- Contextual information about a task which can be read or mutated during execution. To access the TaskContext for a running task, use: - TaskContext.get().- New in version 2.2.0. - Examples - >>> from pyspark import TaskContext - Get a task context instance from - RDD.- >>> spark.sparkContext.setLocalProperty("key1", "value") >>> taskcontext = spark.sparkContext.parallelize([1]).map(lambda _: TaskContext.get()).first() >>> isinstance(taskcontext.attemptNumber(), int) True >>> isinstance(taskcontext.partitionId(), int) True >>> isinstance(taskcontext.stageId(), int) True >>> isinstance(taskcontext.taskAttemptId(), int) True >>> taskcontext.getLocalProperty("key1") 'value' >>> isinstance(taskcontext.cpus(), int) True - Get a task context instance from a dataframe via Python UDF. - >>> from pyspark.sql import Row >>> from pyspark.sql.functions import udf >>> @udf("STRUCT<anum: INT, partid: INT, stageid: INT, taskaid: INT, prop: STRING, cpus: INT>") ... def taskcontext_as_row(): ... taskcontext = TaskContext.get() ... return Row( ... anum=taskcontext.attemptNumber(), ... partid=taskcontext.partitionId(), ... stageid=taskcontext.stageId(), ... taskaid=taskcontext.taskAttemptId(), ... prop=taskcontext.getLocalProperty("key2"), ... cpus=taskcontext.cpus()) ... >>> spark.sparkContext.setLocalProperty("key2", "value") >>> [(anum, partid, stageid, taskaid, prop, cpus)] = ( ... spark.range(1).select(taskcontext_as_row()).first() ... ) >>> isinstance(anum, int) True >>> isinstance(partid, int) True >>> isinstance(stageid, int) True >>> isinstance(taskaid, int) True >>> prop 'value' >>> isinstance(cpus, int) True - Get a task context instance from a dataframe via Pandas UDF. - >>> import pandas as pd >>> from pyspark.sql.functions import pandas_udf >>> @pandas_udf("STRUCT<" ... "anum: INT, partid: INT, stageid: INT, taskaid: INT, prop: STRING, cpus: INT>") ... def taskcontext_as_row(_): ... taskcontext = TaskContext.get() ... return pd.DataFrame({ ... "anum": [taskcontext.attemptNumber()], ... "partid": [taskcontext.partitionId()], ... "stageid": [taskcontext.stageId()], ... "taskaid": [taskcontext.taskAttemptId()], ... "prop": [taskcontext.getLocalProperty("key3")], ... "cpus": [taskcontext.cpus()] ... }) ... >>> spark.sparkContext.setLocalProperty("key3", "value") >>> [(anum, partid, stageid, taskaid, prop, cpus)] = ( ... spark.range(1).select(taskcontext_as_row("id")).first() ... ) >>> isinstance(anum, int) True >>> isinstance(partid, int) True >>> isinstance(stageid, int) True >>> isinstance(taskaid, int) True >>> prop 'value' >>> isinstance(cpus, int) True - Methods - How many times this task has been attempted. - cpus()- CPUs allocated to the task. - get()- Return the currently active - TaskContext.- getLocalProperty(key)- Get a local property set upstream in the driver, or None if it is missing. - The ID of the RDD partition that is computed by this task. - Resources allocated to the task. - stageId()- The ID of the stage that this task belong to. - An ID that is unique to this task attempt (within the same - SparkContext, no two task attempts will share the same attempt ID).