public class Stage extends Object implements Logging
Each Stage can either be a shuffle map stage, in which case its tasks' results are input for another stage, or a result stage, in which case its tasks directly compute the action that initiated a job (e.g. count(), save(), etc). For shuffle map stages, we also track the nodes that each output partition is on.
Each Stage also has a jobId, identifying the job that first submitted the stage. When FIFO scheduling is used, this allows Stages from earlier jobs to be computed first or recovered faster on failure.
The callSite provides a location in user code which relates to the stage. For a shuffle map stage, the callSite gives the user code that created the RDD being shuffled. For a result stage, the callSite gives the user code that executes the associated action (e.g. count()).
A single stage can consist of multiple attempts. In that case, the latestInfo field will be updated for each attempt.
Constructor and Description |
---|
Stage(int id,
RDD<?> rdd,
int numTasks,
scala.Option<ShuffleDependency<?,?,?>> shuffleDep,
scala.collection.immutable.List<Stage> parents,
int jobId,
CallSite callSite) |
Modifier and Type | Method and Description |
---|---|
void |
addOutputLoc(int partition,
MapStatus status) |
int |
attemptId() |
CallSite |
callSite() |
String |
details() |
boolean |
equals(Object other) |
int |
hashCode() |
int |
id() |
boolean |
isAvailable() |
boolean |
isShuffleMap() |
int |
jobId() |
scala.collection.mutable.HashSet<Object> |
jobIds()
Set of jobs that this stage belongs to.
|
StageInfo |
latestInfo()
Pointer to the latest [StageInfo] object, set by DAGScheduler.
|
String |
name() |
int |
newAttemptId()
Return a new attempt id, starting with 0.
|
int |
numAvailableOutputs() |
int |
numPartitions() |
int |
numTasks() |
scala.collection.immutable.List<MapStatus>[] |
outputLocs() |
scala.collection.immutable.List<Stage> |
parents() |
scala.collection.mutable.HashSet<Task<?>> |
pendingTasks() |
Object |
rdd() |
void |
removeOutputLoc(int partition,
BlockManagerId bmAddress) |
void |
removeOutputsOnExecutor(String execId)
Removes all shuffle outputs associated with this executor.
|
scala.Option<ActiveJob> |
resultOfJob()
For stages that are the final (consists of only ResultTasks), link to the ActiveJob.
|
scala.Option<ShuffleDependency<?,?,?>> |
shuffleDep() |
String |
toString() |
initializeIfNecessary, initializeLogging, isTraceEnabled, log_, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning
public Stage(int id, RDD<?> rdd, int numTasks, scala.Option<ShuffleDependency<?,?,?>> shuffleDep, scala.collection.immutable.List<Stage> parents, int jobId, CallSite callSite)
public int id()
public Object rdd()
public int numTasks()
public scala.Option<ShuffleDependency<?,?,?>> shuffleDep()
public scala.collection.immutable.List<Stage> parents()
public int jobId()
public CallSite callSite()
public boolean isShuffleMap()
public int numPartitions()
public scala.collection.immutable.List<MapStatus>[] outputLocs()
public int numAvailableOutputs()
public scala.collection.mutable.HashSet<Object> jobIds()
public scala.Option<ActiveJob> resultOfJob()
public scala.collection.mutable.HashSet<Task<?>> pendingTasks()
public String name()
public String details()
public StageInfo latestInfo()
public boolean isAvailable()
public void addOutputLoc(int partition, MapStatus status)
public void removeOutputLoc(int partition, BlockManagerId bmAddress)
public void removeOutputsOnExecutor(String execId)
public int newAttemptId()
public int attemptId()
public String toString()
toString
in class Object
public int hashCode()
hashCode
in class Object
public boolean equals(Object other)
equals
in class Object