public interface JobMasterGateway extends CheckpointCoordinatorGateway, FencedRpcGateway<JobMasterId>, KvStateLocationOracle, KvStateRegistryGateway
JobMaster
rpc gateway interface.Modifier and Type | Method and Description |
---|---|
CompletableFuture<Acknowledge> |
cancel(Time timeout)
Cancels the currently executed job.
|
void |
disconnectResourceManager(ResourceManagerId resourceManagerId,
Exception cause)
Disconnects the resource manager from the job manager because of the given cause.
|
CompletableFuture<Acknowledge> |
disconnectTaskManager(ResourceID resourceID,
Exception cause)
Disconnects the given
TaskExecutor from the
JobMaster . |
void |
failSlot(ResourceID taskManagerId,
AllocationID allocationId,
Exception cause)
Fails the slot with the given allocation id and cause.
|
void |
heartbeatFromResourceManager(ResourceID resourceID)
Sends heartbeat request from the resource manager.
|
void |
heartbeatFromTaskManager(ResourceID resourceID,
AccumulatorReport accumulatorReport)
Sends the heartbeat to job manager from task manager.
|
void |
notifyAllocationFailure(AllocationID allocationID,
Exception cause)
Notifies that the allocation has failed.
|
CompletableFuture<Collection<SlotOffer>> |
offerSlots(ResourceID taskManagerId,
Collection<SlotOffer> slots,
Time timeout)
Offers the given slots to the job manager.
|
CompletableFuture<RegistrationResponse> |
registerTaskManager(String taskManagerRpcAddress,
TaskManagerLocation taskManagerLocation,
Time timeout)
Registers the task manager at the job manager.
|
CompletableFuture<ClassloadingProps> |
requestClassloadingProps()
Request the classloading props of this job.
|
CompletableFuture<ArchivedExecutionGraph> |
requestJob(Time timeout)
Requests the
ArchivedExecutionGraph of the executed job. |
CompletableFuture<JobDetails> |
requestJobDetails(Time timeout)
Request the details of the executed job.
|
CompletableFuture<JobStatus> |
requestJobStatus(Time timeout)
Requests the current job status.
|
CompletableFuture<SerializedInputSplit> |
requestNextInputSplit(JobVertexID vertexID,
ExecutionAttemptID executionAttempt)
Requests the next input split for the
ExecutionJobVertex . |
CompletableFuture<OperatorBackPressureStatsResponse> |
requestOperatorBackPressureStats(JobVertexID jobVertexId)
Requests the statistics on operator back pressure.
|
CompletableFuture<ExecutionState> |
requestPartitionState(IntermediateDataSetID intermediateResultId,
ResultPartitionID partitionId)
Requests the current state of the partition.
|
CompletableFuture<Acknowledge> |
rescaleJob(int newParallelism,
RescalingBehaviour rescalingBehaviour,
Time timeout)
Triggers rescaling of the executed job.
|
CompletableFuture<Acknowledge> |
rescaleOperators(Collection<JobVertexID> operators,
int newParallelism,
RescalingBehaviour rescalingBehaviour,
Time timeout)
Triggers rescaling of the given set of operators.
|
CompletableFuture<Acknowledge> |
scheduleOrUpdateConsumers(ResultPartitionID partitionID,
Time timeout)
Notifies the JobManager about available data for a produced partition.
|
CompletableFuture<Acknowledge> |
stop(Time timeout)
Cancel the currently executed job.
|
CompletableFuture<String> |
triggerSavepoint(String targetDirectory,
boolean cancelJob,
Time timeout)
Triggers taking a savepoint of the executed job.
|
CompletableFuture<Acknowledge> |
updateTaskExecutionState(TaskExecutionState taskExecutionState)
Updates the task execution state for a given task.
|
acknowledgeCheckpoint, declineCheckpoint
getFencingToken
getAddress, getHostname
requestKvStateLocation
notifyKvStateRegistered, notifyKvStateUnregistered
CompletableFuture<Acknowledge> cancel(Time timeout)
timeout
- of this operationCompletableFuture<Acknowledge> stop(Time timeout)
timeout
- of this operationCompletableFuture<Acknowledge> rescaleJob(int newParallelism, RescalingBehaviour rescalingBehaviour, Time timeout)
newParallelism
- new parallelism of the jobrescalingBehaviour
- defining how strict the rescaling has to be executedtimeout
- of this operationAcknowledge
once the rescaling was successfulCompletableFuture<Acknowledge> rescaleOperators(Collection<JobVertexID> operators, int newParallelism, RescalingBehaviour rescalingBehaviour, Time timeout)
operators
- set of operators which shall be rescalednewParallelism
- new parallelism of the given set of operatorsrescalingBehaviour
- defining how strict the rescaling has to be executedtimeout
- of this operationAcknowledge
once the rescaling was successfulCompletableFuture<Acknowledge> updateTaskExecutionState(TaskExecutionState taskExecutionState)
taskExecutionState
- New task execution state for a given taskCompletableFuture<SerializedInputSplit> requestNextInputSplit(JobVertexID vertexID, ExecutionAttemptID executionAttempt)
ExecutionJobVertex
.
The next input split is sent back to the sender as a
SerializedInputSplit
message.vertexID
- The job vertex idexecutionAttempt
- The execution attempt idCompletableFuture<ExecutionState> requestPartitionState(IntermediateDataSetID intermediateResultId, ResultPartitionID partitionId)
intermediateResultId
- The execution attempt ID of the task requesting the partition state.partitionId
- The partition ID of the partition to request the state of.CompletableFuture<Acknowledge> scheduleOrUpdateConsumers(ResultPartitionID partitionID, Time timeout)
There is a call to this method for each ExecutionVertex
instance once per produced
ResultPartition
instance, either when first producing data (for pipelined executions)
or when all data has been produced (for staged executions).
The JobManager then can decide when to schedule the partition consumers of the given session.
partitionID
- The partition which has already produced datatimeout
- before the rpc call failsCompletableFuture<Acknowledge> disconnectTaskManager(ResourceID resourceID, Exception cause)
TaskExecutor
from the
JobMaster
.resourceID
- identifying the TaskManager to disconnectcause
- for the disconnection of the TaskManagervoid disconnectResourceManager(ResourceManagerId resourceManagerId, Exception cause)
resourceManagerId
- identifying the resource manager leader idcause
- of the disconnectCompletableFuture<ClassloadingProps> requestClassloadingProps()
CompletableFuture<Collection<SlotOffer>> offerSlots(ResourceID taskManagerId, Collection<SlotOffer> slots, Time timeout)
taskManagerId
- identifying the task managerslots
- to offer to the job managertimeout
- for the rpc callvoid failSlot(ResourceID taskManagerId, AllocationID allocationId, Exception cause)
taskManagerId
- identifying the task managerallocationId
- identifying the slot to failcause
- of the failingCompletableFuture<RegistrationResponse> registerTaskManager(String taskManagerRpcAddress, TaskManagerLocation taskManagerLocation, Time timeout)
taskManagerRpcAddress
- the rpc address of the task managertaskManagerLocation
- location of the task managertimeout
- for the rpc callvoid heartbeatFromTaskManager(ResourceID resourceID, AccumulatorReport accumulatorReport)
resourceID
- unique id of the task manageraccumulatorReport
- report containing accumulator updatesvoid heartbeatFromResourceManager(ResourceID resourceID)
resourceID
- unique id of the resource managerCompletableFuture<JobDetails> requestJobDetails(Time timeout)
timeout
- for the rpc callCompletableFuture<JobStatus> requestJobStatus(Time timeout)
timeout
- for the rpc callCompletableFuture<ArchivedExecutionGraph> requestJob(Time timeout)
ArchivedExecutionGraph
of the executed job.timeout
- for the rpc callArchivedExecutionGraph
of the executed jobCompletableFuture<String> triggerSavepoint(@Nullable String targetDirectory, boolean cancelJob, Time timeout)
targetDirectory
- to which to write the savepoint data or null if the
default savepoint directory should be usedtimeout
- for the rpc callCompletableFuture<OperatorBackPressureStatsResponse> requestOperatorBackPressureStats(JobVertexID jobVertexId)
jobVertexId
- JobVertex for which the stats are requested.OperatorBackPressureStatsResponse
or null
if the stats are
not available (yet).void notifyAllocationFailure(AllocationID allocationID, Exception cause)
allocationID
- the failed allocation id.cause
- the reason that the allocation failedCopyright © 2014–2019 The Apache Software Foundation. All rights reserved.