public class Execution extends Object implements AccessExecution, Archiveable<ArchivedExecution>, LogicalSlot.Payload
ExecutionVertex
can be executed multiple times
(for recovery, re-computation, re-configuration), this class tracks the state of a single
execution of that vertex and the resources.
In several points of the code, we need to deal with possible concurrent state changes and actions. For example, while the call to deploy a task (send it to the TaskManager) happens, the task gets cancelled.
We could lock the entire portion of the code (decision to deploy, deploy, set state to running) such that it is guaranteed that any "cancel command" will only pick up after deployment is done and that the "cancel command" call will never overtake the deploying call.
This blocks the threads big time, because the remote calls may take long. Depending of their locking behavior, it may even result in distributed deadlocks (unless carefully avoided). We therefore use atomic state updates and occasional double-checking to ensure that the state after a completed call is as expected, and trigger correcting actions if it is not. Many actions are also idempotent (like canceling).
Constructor and Description |
---|
Execution(Executor executor,
ExecutionVertex vertex,
int attemptNumber,
long globalModVersion,
long startTimestamp,
Time rpcTimeout)
Creates a new Execution attempt.
|
Modifier and Type | Method and Description |
---|---|
ArchivedExecution |
archive() |
CompletableFuture<Collection<TaskManagerLocation>> |
calculatePreferredLocations(LocationPreferenceConstraint locationPreferenceConstraint)
Calculates the preferred locations based on the location preference constraint.
|
void |
cancel() |
void |
deploy()
Deploys the execution to the previously assigned resource.
|
void |
fail(Throwable t)
This method fails the vertex due to an external condition.
|
AllocationID |
getAssignedAllocationID() |
LogicalSlot |
getAssignedResource() |
TaskManagerLocation |
getAssignedResourceLocation()
Returns the
TaskManagerLocation for this execution. |
ExecutionAttemptID |
getAttemptId()
Returns the
ExecutionAttemptID for this Execution. |
int |
getAttemptNumber()
Returns the attempt number for this execution.
|
Throwable |
getFailureCause() |
String |
getFailureCauseAsString()
Returns the exception that caused the job to fail.
|
long |
getGlobalModVersion()
Gets the global modification version of the execution graph when this execution was created.
|
IOMetrics |
getIOMetrics() |
InputSplit |
getNextInputSplit() |
int |
getParallelSubtaskIndex()
Returns the subtask index of this execution.
|
CompletableFuture<?> |
getReleaseFuture()
Gets the release future which is completed once the execution reaches a terminal state and
the assigned resource has been released.
|
Optional<ResultPartitionDeploymentDescriptor> |
getResultPartitionDeploymentDescriptor(IntermediateResultPartitionID id) |
ExecutionState |
getState()
Returns the current
ExecutionState for this execution. |
long |
getStateTimestamp(ExecutionState state)
Returns the timestamp for the given
ExecutionState . |
long[] |
getStateTimestamps()
Returns the timestamps for every
ExecutionState . |
CompletableFuture<TaskManagerLocation> |
getTaskManagerLocationFuture() |
JobManagerTaskRestore |
getTaskRestore() |
CompletableFuture<ExecutionState> |
getTerminalStateFuture()
Gets a future that completes once the task execution reaches a terminal state.
|
Map<String,Accumulator<?,?>> |
getUserAccumulators() |
StringifiedAccumulatorResult[] |
getUserAccumulatorsStringified()
Returns the user-defined accumulators as strings.
|
ExecutionVertex |
getVertex() |
String |
getVertexWithAttempt() |
boolean |
isFinished() |
void |
notifyCheckpointAborted(long abortCheckpointId,
long timestamp)
Notify the task of this execution about a aborted checkpoint.
|
void |
notifyCheckpointComplete(long checkpointId,
long timestamp)
Notify the task of this execution about a completed checkpoint.
|
CompletableFuture<Execution> |
registerProducedPartitions(TaskManagerLocation location) |
CompletableFuture<Execution> |
registerProducedPartitions(TaskManagerLocation location,
boolean sendScheduleOrUpdateConsumersMessage) |
CompletableFuture<TaskBackPressureResponse> |
requestBackPressure(int requestId,
Time timeout)
Request the back pressure ratio from the task of this execution.
|
CompletableFuture<Void> |
scheduleForExecution() |
CompletableFuture<Void> |
scheduleForExecution(SlotProviderStrategy slotProviderStrategy,
LocationPreferenceConstraint locationPreferenceConstraint,
Set<AllocationID> allPreviousExecutionGraphAllocationIds)
NOTE: This method only throws exceptions if it is in an illegal state to be scheduled, or if
the tasks needs to be scheduled immediately and no resource is available.
|
CompletableFuture<Acknowledge> |
sendOperatorEvent(OperatorID operatorId,
SerializedValue<OperatorEvent> event)
Sends the operator event to the Task on the Task Executor.
|
void |
setAccumulators(Map<String,Accumulator<?,?>> userAccumulators)
Update accumulators (discarded when the Execution has already been terminated).
|
void |
setInitialState(JobManagerTaskRestore taskRestore)
Sets the initial state for the execution.
|
CompletableFuture<?> |
suspend() |
String |
toString() |
void |
transitionState(ExecutionState targetState) |
void |
triggerCheckpoint(long checkpointId,
long timestamp,
CheckpointOptions checkpointOptions)
Trigger a new checkpoint on the task of this execution.
|
void |
triggerSynchronousSavepoint(long checkpointId,
long timestamp,
CheckpointOptions checkpointOptions)
Trigger a new checkpoint on the task of this execution.
|
boolean |
tryAssignResource(LogicalSlot logicalSlot)
Tries to assign the given slot to the execution.
|
public Execution(Executor executor, ExecutionVertex vertex, int attemptNumber, long globalModVersion, long startTimestamp, Time rpcTimeout)
executor
- The executor used to dispatch callbacks from futures and asynchronous RPC
calls.vertex
- The execution vertex to which this Execution belongsattemptNumber
- The execution attempt number.globalModVersion
- The global modification version of the execution graph when this
execution was createdstartTimestamp
- The timestamp that marks the creation of this ExecutionrpcTimeout
- The rpcTimeout for RPC calls like deploy/cancel/stop.public ExecutionVertex getVertex()
public ExecutionAttemptID getAttemptId()
AccessExecution
ExecutionAttemptID
for this Execution.getAttemptId
in interface AccessExecution
public int getAttemptNumber()
AccessExecution
getAttemptNumber
in interface AccessExecution
public ExecutionState getState()
AccessExecution
ExecutionState
for this execution.getState
in interface AccessExecution
@Nullable public AllocationID getAssignedAllocationID()
public long getGlobalModVersion()
This version is bumped in the ExecutionGraph whenever a global failover happens. It is used to resolve conflicts between concurrent modification by global and local failover actions.
public CompletableFuture<TaskManagerLocation> getTaskManagerLocationFuture()
public LogicalSlot getAssignedResource()
public Optional<ResultPartitionDeploymentDescriptor> getResultPartitionDeploymentDescriptor(IntermediateResultPartitionID id)
public boolean tryAssignResource(LogicalSlot logicalSlot)
logicalSlot
- to assign to this executionpublic InputSplit getNextInputSplit()
public TaskManagerLocation getAssignedResourceLocation()
AccessExecution
TaskManagerLocation
for this execution.getAssignedResourceLocation
in interface AccessExecution
public Throwable getFailureCause()
public String getFailureCauseAsString()
AccessExecution
getFailureCauseAsString
in interface AccessExecution
"(null)"
public long[] getStateTimestamps()
AccessExecution
ExecutionState
.getStateTimestamps
in interface AccessExecution
public long getStateTimestamp(ExecutionState state)
AccessExecution
ExecutionState
.getStateTimestamp
in interface AccessExecution
state
- state for which the timestamp should be returnedpublic boolean isFinished()
@Nullable public JobManagerTaskRestore getTaskRestore()
public void setInitialState(@Nullable JobManagerTaskRestore taskRestore)
TaskDeploymentDescriptor
to the TaskManagers.taskRestore
- information to restore the statepublic CompletableFuture<ExecutionState> getTerminalStateFuture()
getTerminalStateFuture
in interface LogicalSlot.Payload
public CompletableFuture<?> getReleaseFuture()
public CompletableFuture<Void> scheduleForExecution()
public CompletableFuture<Void> scheduleForExecution(SlotProviderStrategy slotProviderStrategy, LocationPreferenceConstraint locationPreferenceConstraint, @Nonnull Set<AllocationID> allPreviousExecutionGraphAllocationIds)
slotProviderStrategy
- The slot provider strategy to use to allocate slot for this
execution attempt.locationPreferenceConstraint
- constraint for the location preferencesallPreviousExecutionGraphAllocationIds
- set with all previous allocation ids in the job
graph. Can be empty if the allocation ids are not required for scheduling.public CompletableFuture<Execution> registerProducedPartitions(TaskManagerLocation location)
public CompletableFuture<Execution> registerProducedPartitions(TaskManagerLocation location, boolean sendScheduleOrUpdateConsumersMessage)
public void deploy() throws JobException
JobException
- if the execution cannot be deployed to the assigned resourcepublic void cancel()
public CompletableFuture<?> suspend()
public void fail(Throwable t)
fail
in interface LogicalSlot.Payload
t
- The exception that caused the task to fail.public CompletableFuture<TaskBackPressureResponse> requestBackPressure(int requestId, Time timeout)
requestId
- id of the request.timeout
- the request times out.public void notifyCheckpointComplete(long checkpointId, long timestamp)
checkpointId
- of the completed checkpointtimestamp
- of the completed checkpointpublic void notifyCheckpointAborted(long abortCheckpointId, long timestamp)
abortCheckpointId
- of the subsumed checkpointtimestamp
- of the subsumed checkpointpublic void triggerCheckpoint(long checkpointId, long timestamp, CheckpointOptions checkpointOptions)
checkpointId
- of th checkpoint to triggertimestamp
- of the checkpoint to triggercheckpointOptions
- of the checkpoint to triggerpublic void triggerSynchronousSavepoint(long checkpointId, long timestamp, CheckpointOptions checkpointOptions)
checkpointId
- of th checkpoint to triggertimestamp
- of the checkpoint to triggercheckpointOptions
- of the checkpoint to triggerpublic CompletableFuture<Acknowledge> sendOperatorEvent(OperatorID operatorId, SerializedValue<OperatorEvent> event)
@VisibleForTesting public CompletableFuture<Collection<TaskManagerLocation>> calculatePreferredLocations(LocationPreferenceConstraint locationPreferenceConstraint)
locationPreferenceConstraint
- constraint for the location preferencepublic void transitionState(ExecutionState targetState)
public String getVertexWithAttempt()
public void setAccumulators(Map<String,Accumulator<?,?>> userAccumulators)
userAccumulators
- the user accumulatorspublic Map<String,Accumulator<?,?>> getUserAccumulators()
public StringifiedAccumulatorResult[] getUserAccumulatorsStringified()
AccessExecution
getUserAccumulatorsStringified
in interface AccessExecution
public int getParallelSubtaskIndex()
AccessExecution
getParallelSubtaskIndex
in interface AccessExecution
public IOMetrics getIOMetrics()
getIOMetrics
in interface AccessExecution
public ArchivedExecution archive()
archive
in interface Archiveable<ArchivedExecution>
Copyright © 2014–2021 The Apache Software Foundation. All rights reserved.