public class Execution extends Object implements AccessExecution, Archiveable<ArchivedExecution>, LogicalSlot.Payload
ExecutionVertex
can be executed multiple times
(for recovery, re-computation, re-configuration), this class tracks the state of a single execution
of that vertex and the resources.
In several points of the code, we need to deal with possible concurrent state changes and actions. For example, while the call to deploy a task (send it to the TaskManager) happens, the task gets cancelled.
We could lock the entire portion of the code (decision to deploy, deploy, set state to running) such that it is guaranteed that any "cancel command" will only pick up after deployment is done and that the "cancel command" call will never overtake the deploying call.
This blocks the threads big time, because the remote calls may take long. Depending of their locking behavior, it may even result in distributed deadlocks (unless carefully avoided). We therefore use atomic state updates and occasional double-checking to ensure that the state after a completed call is as expected, and trigger correcting actions if it is not. Many actions are also idempotent (like canceling).
Constructor and Description |
---|
Execution(Executor executor,
ExecutionVertex vertex,
int attemptNumber,
long globalModVersion,
long startTimestamp,
Time rpcTimeout)
Creates a new Execution attempt.
|
Modifier and Type | Method and Description |
---|---|
CompletableFuture<Execution> |
allocateAndAssignSlotForExecution(SlotProvider slotProvider,
boolean queued,
LocationPreferenceConstraint locationPreferenceConstraint,
Time allocationTimeout)
Allocates and assigns a slot obtained from the slot provider to the execution.
|
ArchivedExecution |
archive() |
CompletableFuture<Collection<TaskManagerLocation>> |
calculatePreferredLocations(LocationPreferenceConstraint locationPreferenceConstraint)
Calculates the preferred locations based on the location preference constraint.
|
void |
cancel() |
void |
deploy()
Deploys the execution to the previously assigned resource.
|
void |
fail(Throwable t)
This method fails the vertex due to an external condition.
|
AllocationID |
getAssignedAllocationID() |
LogicalSlot |
getAssignedResource() |
TaskManagerLocation |
getAssignedResourceLocation()
Returns the
TaskManagerLocation for this execution. |
ExecutionAttemptID |
getAttemptId()
Returns the
ExecutionAttemptID for this Execution. |
int |
getAttemptNumber()
Returns the attempt number for this execution.
|
Throwable |
getFailureCause() |
String |
getFailureCauseAsString()
Returns the exception that caused the job to fail.
|
long |
getGlobalModVersion()
Gets the global modification version of the execution graph when this execution was created.
|
IOMetrics |
getIOMetrics() |
int |
getParallelSubtaskIndex()
Returns the subtask index of this execution.
|
CompletableFuture<?> |
getReleaseFuture()
Gets the release future which is completed once the execution reaches a terminal
state and the assigned resource has been released.
|
ExecutionState |
getState()
Returns the current
ExecutionState for this execution. |
long |
getStateTimestamp(ExecutionState state)
Returns the timestamp for the given
ExecutionState . |
long[] |
getStateTimestamps()
Returns the timestamps for every
ExecutionState . |
CompletableFuture<TaskManagerLocation> |
getTaskManagerLocationFuture() |
JobManagerTaskRestore |
getTaskRestore() |
CompletableFuture<ExecutionState> |
getTerminalStateFuture()
Gets a future that completes once the task execution reaches a terminal state.
|
Map<String,Accumulator<?,?>> |
getUserAccumulators() |
StringifiedAccumulatorResult[] |
getUserAccumulatorsStringified()
Returns the user-defined accumulators as strings.
|
ExecutionVertex |
getVertex() |
String |
getVertexWithAttempt() |
boolean |
isFinished() |
void |
notifyCheckpointComplete(long checkpointId,
long timestamp)
Notify the task of this execution about a completed checkpoint.
|
CompletableFuture<StackTraceSampleResponse> |
requestStackTraceSample(int sampleId,
int numSamples,
Time delayBetweenSamples,
int maxStackTraceDepth,
Time timeout)
Request a stack trace sample from the task of this execution.
|
CompletableFuture<Void> |
scheduleForExecution() |
CompletableFuture<Void> |
scheduleForExecution(SlotProvider slotProvider,
boolean queued,
LocationPreferenceConstraint locationPreferenceConstraint)
NOTE: This method only throws exceptions if it is in an illegal state to be scheduled, or if the tasks needs
to be scheduled immediately and no resource is available.
|
void |
setAccumulators(Map<String,Accumulator<?,?>> userAccumulators)
Update accumulators (discarded when the Execution has already been terminated).
|
void |
setInitialState(JobManagerTaskRestore taskRestore)
Sets the initial state for the execution.
|
void |
stop()
Sends stop RPC call.
|
String |
toString() |
void |
triggerCheckpoint(long checkpointId,
long timestamp,
CheckpointOptions checkpointOptions)
Trigger a new checkpoint on the task of this execution.
|
public Execution(Executor executor, ExecutionVertex vertex, int attemptNumber, long globalModVersion, long startTimestamp, Time rpcTimeout)
executor
- The executor used to dispatch callbacks from futures and asynchronous RPC calls.vertex
- The execution vertex to which this Execution belongsattemptNumber
- The execution attempt number.globalModVersion
- The global modification version of the execution graph when this execution was createdstartTimestamp
- The timestamp that marks the creation of this ExecutionrpcTimeout
- The rpcTimeout for RPC calls like deploy/cancel/stop.public ExecutionVertex getVertex()
public ExecutionAttemptID getAttemptId()
AccessExecution
ExecutionAttemptID
for this Execution.getAttemptId
in interface AccessExecution
public int getAttemptNumber()
AccessExecution
getAttemptNumber
in interface AccessExecution
public ExecutionState getState()
AccessExecution
ExecutionState
for this execution.getState
in interface AccessExecution
@Nullable public AllocationID getAssignedAllocationID()
public long getGlobalModVersion()
This version is bumped in the ExecutionGraph whenever a global failover happens. It is used to resolve conflicts between concurrent modification by global and local failover actions.
public CompletableFuture<TaskManagerLocation> getTaskManagerLocationFuture()
public LogicalSlot getAssignedResource()
public TaskManagerLocation getAssignedResourceLocation()
AccessExecution
TaskManagerLocation
for this execution.getAssignedResourceLocation
in interface AccessExecution
public Throwable getFailureCause()
public String getFailureCauseAsString()
AccessExecution
getFailureCauseAsString
in interface AccessExecution
"(null)"
public long[] getStateTimestamps()
AccessExecution
ExecutionState
.getStateTimestamps
in interface AccessExecution
public long getStateTimestamp(ExecutionState state)
AccessExecution
ExecutionState
.getStateTimestamp
in interface AccessExecution
state
- state for which the timestamp should be returnedpublic boolean isFinished()
@Nullable public JobManagerTaskRestore getTaskRestore()
public void setInitialState(@Nullable JobManagerTaskRestore taskRestore)
TaskDeploymentDescriptor
to the TaskManagers.taskRestore
- information to restore the statepublic CompletableFuture<ExecutionState> getTerminalStateFuture()
getTerminalStateFuture
in interface LogicalSlot.Payload
public CompletableFuture<?> getReleaseFuture()
public CompletableFuture<Void> scheduleForExecution()
public CompletableFuture<Void> scheduleForExecution(SlotProvider slotProvider, boolean queued, LocationPreferenceConstraint locationPreferenceConstraint)
slotProvider
- The slot provider to use to allocate slot for this execution attempt.queued
- Flag to indicate whether the scheduler may queue this task if it cannot
immediately deploy it.locationPreferenceConstraint
- constraint for the location preferencespublic CompletableFuture<Execution> allocateAndAssignSlotForExecution(SlotProvider slotProvider, boolean queued, LocationPreferenceConstraint locationPreferenceConstraint, Time allocationTimeout) throws IllegalExecutionStateException
slotProvider
- to obtain a new slot fromqueued
- if the allocation can be queuedlocationPreferenceConstraint
- constraint for the location preferencesallocationTimeout
- rpcTimeout for allocating a new slotIllegalExecutionStateException
- if this method has been called while not being in the CREATED statepublic void deploy() throws JobException
JobException
- if the execution cannot be deployed to the assigned resourcepublic void stop()
public void cancel()
public void fail(Throwable t)
fail
in interface LogicalSlot.Payload
t
- The exception that caused the task to fail.public CompletableFuture<StackTraceSampleResponse> requestStackTraceSample(int sampleId, int numSamples, Time delayBetweenSamples, int maxStackTraceDepth, Time timeout)
sampleId
- of the stack trace samplenumSamples
- the sample should containdelayBetweenSamples
- to waitmaxStackTraceDepth
- of the samplestimeout
- until the request times outpublic void notifyCheckpointComplete(long checkpointId, long timestamp)
checkpointId
- of the completed checkpointtimestamp
- of the completed checkpointpublic void triggerCheckpoint(long checkpointId, long timestamp, CheckpointOptions checkpointOptions)
checkpointId
- of th checkpoint to triggertimestamp
- of the checkpoint to triggercheckpointOptions
- of the checkpoint to trigger@VisibleForTesting public CompletableFuture<Collection<TaskManagerLocation>> calculatePreferredLocations(LocationPreferenceConstraint locationPreferenceConstraint)
locationPreferenceConstraint
- constraint for the location preferencepublic String getVertexWithAttempt()
public void setAccumulators(Map<String,Accumulator<?,?>> userAccumulators)
userAccumulators
- the user accumulatorspublic Map<String,Accumulator<?,?>> getUserAccumulators()
public StringifiedAccumulatorResult[] getUserAccumulatorsStringified()
AccessExecution
getUserAccumulatorsStringified
in interface AccessExecution
public int getParallelSubtaskIndex()
AccessExecution
getParallelSubtaskIndex
in interface AccessExecution
public IOMetrics getIOMetrics()
getIOMetrics
in interface AccessExecution
public ArchivedExecution archive()
archive
in interface Archiveable<ArchivedExecution>
Copyright © 2014–2019 The Apache Software Foundation. All rights reserved.