public class JobMaster extends FencedRpcEndpoint<JobMasterId> implements JobMasterGateway, JobMasterService
JobGraph
.
It offers the following methods as part of its rpc interface to interact with the JobMaster remotely:
updateTaskExecutionState(org.apache.flink.runtime.taskmanager.TaskExecutionState)
updates the task execution state for
given taskRpcEndpoint.MainThreadExecutor
Modifier and Type | Field and Description |
---|---|
static String |
ARCHIVE_NAME |
static String |
JOB_MANAGER_NAME
Default names for Flink's distributed components.
|
log, rpcServer
Constructor and Description |
---|
JobMaster(RpcService rpcService,
JobMasterConfiguration jobMasterConfiguration,
ResourceID resourceId,
JobGraph jobGraph,
HighAvailabilityServices highAvailabilityService,
SlotPoolFactory slotPoolFactory,
SchedulerFactory schedulerFactory,
JobManagerSharedServices jobManagerSharedServices,
HeartbeatServices heartbeatServices,
JobManagerJobMetricGroupFactory jobMetricGroupFactory,
OnCompletionActions jobCompletionActions,
FatalErrorHandler fatalErrorHandler,
ClassLoader userCodeLoader) |
Modifier and Type | Method and Description |
---|---|
void |
acknowledgeCheckpoint(JobID jobID,
ExecutionAttemptID executionAttemptID,
long checkpointId,
CheckpointMetrics checkpointMetrics,
TaskStateSnapshot checkpointState) |
CompletableFuture<Acknowledge> |
cancel(Time timeout)
Cancels the currently executed job.
|
void |
declineCheckpoint(DeclineCheckpoint decline) |
void |
disconnectResourceManager(ResourceManagerId resourceManagerId,
Exception cause)
Disconnects the resource manager from the job manager because of the given cause.
|
CompletableFuture<Acknowledge> |
disconnectTaskManager(ResourceID resourceID,
Exception cause)
Disconnects the given
TaskExecutor from the
JobMaster . |
void |
failSlot(ResourceID taskManagerId,
AllocationID allocationId,
Exception cause)
Fails the slot with the given allocation id and cause.
|
JobMasterGateway |
getGateway()
Get the
JobMasterGateway belonging to this service. |
void |
heartbeatFromResourceManager(ResourceID resourceID)
Sends heartbeat request from the resource manager.
|
void |
heartbeatFromTaskManager(ResourceID resourceID,
AccumulatorReport accumulatorReport)
Sends the heartbeat to job manager from task manager.
|
void |
notifyAllocationFailure(AllocationID allocationID,
Exception cause)
Notifies that the allocation has failed.
|
CompletableFuture<Acknowledge> |
notifyKvStateRegistered(JobID jobId,
JobVertexID jobVertexId,
KeyGroupRange keyGroupRange,
String registrationName,
KvStateID kvStateId,
InetSocketAddress kvStateServerAddress)
Notifies that queryable state has been registered.
|
CompletableFuture<Acknowledge> |
notifyKvStateUnregistered(JobID jobId,
JobVertexID jobVertexId,
KeyGroupRange keyGroupRange,
String registrationName)
Notifies that queryable state has been unregistered.
|
CompletableFuture<Collection<SlotOffer>> |
offerSlots(ResourceID taskManagerId,
Collection<SlotOffer> slots,
Time timeout)
Offers the given slots to the job manager.
|
CompletableFuture<Void> |
onStop()
Suspend the job and shutdown all other services including rpc.
|
CompletableFuture<RegistrationResponse> |
registerTaskManager(String taskManagerRpcAddress,
TaskManagerLocation taskManagerLocation,
Time timeout)
Registers the task manager at the job manager.
|
CompletableFuture<ArchivedExecutionGraph> |
requestJob(Time timeout)
Requests the
ArchivedExecutionGraph of the executed job. |
CompletableFuture<JobDetails> |
requestJobDetails(Time timeout)
Request the details of the executed job.
|
CompletableFuture<JobStatus> |
requestJobStatus(Time timeout)
Requests the current job status.
|
CompletableFuture<KvStateLocation> |
requestKvStateLocation(JobID jobId,
String registrationName)
Requests a
KvStateLocation for the specified InternalKvState registration name. |
CompletableFuture<SerializedInputSplit> |
requestNextInputSplit(JobVertexID vertexID,
ExecutionAttemptID executionAttempt)
Requests the next input split for the
ExecutionJobVertex . |
CompletableFuture<OperatorBackPressureStatsResponse> |
requestOperatorBackPressureStats(JobVertexID jobVertexId)
Requests the statistics on operator back pressure.
|
CompletableFuture<ExecutionState> |
requestPartitionState(IntermediateDataSetID intermediateResultId,
ResultPartitionID resultPartitionId)
Requests the current state of the partition.
|
CompletableFuture<Acknowledge> |
rescaleJob(int newParallelism,
RescalingBehaviour rescalingBehaviour,
Time timeout)
Triggers rescaling of the executed job.
|
CompletableFuture<Acknowledge> |
rescaleOperators(Collection<JobVertexID> operators,
int newParallelism,
RescalingBehaviour rescalingBehaviour,
Time timeout)
Triggers rescaling of the given set of operators.
|
CompletableFuture<Acknowledge> |
scheduleOrUpdateConsumers(ResultPartitionID partitionID,
Time timeout)
Notifies the JobManager about available data for a produced partition.
|
CompletableFuture<Acknowledge> |
start(JobMasterId newJobMasterId)
Start the rpc service and begin to run the job.
|
CompletableFuture<Acknowledge> |
stop(Time timeout)
Cancel the currently executed job.
|
CompletableFuture<Acknowledge> |
suspend(Exception cause)
Suspending job, all the running tasks will be cancelled, and communication with other components
will be disposed.
|
CompletableFuture<String> |
triggerSavepoint(String targetDirectory,
boolean cancelJob,
Time timeout)
Triggers taking a savepoint of the executed job.
|
CompletableFuture<Object> |
updateGlobalAggregate(String aggregateName,
Object aggregand,
byte[] serializedAggregateFunction)
Update the aggregate and return the new value.
|
CompletableFuture<Acknowledge> |
updateTaskExecutionState(TaskExecutionState taskExecutionState)
Updates the task execution state for a given task.
|
callAsyncWithoutFencing, getFencingToken, getMainThreadExecutor, getUnfencedMainThreadExecutor, runAsyncWithoutFencing, setFencingToken
callAsync, closeAsync, getAddress, getEndpointId, getHostname, getRpcService, getSelfGateway, getTerminationFuture, onStart, runAsync, scheduleRunAsync, scheduleRunAsync, start, stop, validateRunsInMainThread
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
getFencingToken
getAddress, getHostname
getAddress
close, closeAsync
public static final String JOB_MANAGER_NAME
public static final String ARCHIVE_NAME
public JobMaster(RpcService rpcService, JobMasterConfiguration jobMasterConfiguration, ResourceID resourceId, JobGraph jobGraph, HighAvailabilityServices highAvailabilityService, SlotPoolFactory slotPoolFactory, SchedulerFactory schedulerFactory, JobManagerSharedServices jobManagerSharedServices, HeartbeatServices heartbeatServices, JobManagerJobMetricGroupFactory jobMetricGroupFactory, OnCompletionActions jobCompletionActions, FatalErrorHandler fatalErrorHandler, ClassLoader userCodeLoader) throws Exception
Exception
public CompletableFuture<Acknowledge> start(JobMasterId newJobMasterId) throws Exception
start
in interface JobMasterService
newJobMasterId
- The necessary fencing token to run the jobException
- if the JobMaster service could not be startedpublic CompletableFuture<Acknowledge> suspend(Exception cause)
Mostly job is suspended because of the leadership has been revoked, one can be restart this job by
calling the start(JobMasterId)
method once we take the leadership back again.
This method is executed asynchronously
suspend
in interface JobMasterService
cause
- The reason of why this job been suspended.public CompletableFuture<Void> onStop()
onStop
in class RpcEndpoint
public CompletableFuture<Acknowledge> cancel(Time timeout)
JobMasterGateway
cancel
in interface JobMasterGateway
timeout
- of this operationpublic CompletableFuture<Acknowledge> stop(Time timeout)
JobMasterGateway
stop
in interface JobMasterGateway
timeout
- of this operationpublic CompletableFuture<Acknowledge> rescaleJob(int newParallelism, RescalingBehaviour rescalingBehaviour, Time timeout)
JobMasterGateway
rescaleJob
in interface JobMasterGateway
newParallelism
- new parallelism of the jobrescalingBehaviour
- defining how strict the rescaling has to be executedtimeout
- of this operationAcknowledge
once the rescaling was successfulpublic CompletableFuture<Acknowledge> rescaleOperators(Collection<JobVertexID> operators, int newParallelism, RescalingBehaviour rescalingBehaviour, Time timeout)
JobMasterGateway
rescaleOperators
in interface JobMasterGateway
operators
- set of operators which shall be rescalednewParallelism
- new parallelism of the given set of operatorsrescalingBehaviour
- defining how strict the rescaling has to be executedtimeout
- of this operationAcknowledge
once the rescaling was successfulpublic CompletableFuture<Acknowledge> updateTaskExecutionState(TaskExecutionState taskExecutionState)
updateTaskExecutionState
in interface JobMasterGateway
taskExecutionState
- New task execution state for a given taskpublic CompletableFuture<SerializedInputSplit> requestNextInputSplit(JobVertexID vertexID, ExecutionAttemptID executionAttempt)
JobMasterGateway
ExecutionJobVertex
.
The next input split is sent back to the sender as a
SerializedInputSplit
message.requestNextInputSplit
in interface JobMasterGateway
vertexID
- The job vertex idexecutionAttempt
- The execution attempt idpublic CompletableFuture<ExecutionState> requestPartitionState(IntermediateDataSetID intermediateResultId, ResultPartitionID resultPartitionId)
JobMasterGateway
requestPartitionState
in interface JobMasterGateway
intermediateResultId
- The execution attempt ID of the task requesting the partition state.resultPartitionId
- The partition ID of the partition to request the state of.public CompletableFuture<Acknowledge> scheduleOrUpdateConsumers(ResultPartitionID partitionID, Time timeout)
JobMasterGateway
There is a call to this method for each ExecutionVertex
instance once per produced
ResultPartition
instance, either when first producing data (for pipelined executions)
or when all data has been produced (for staged executions).
The JobManager then can decide when to schedule the partition consumers of the given session.
scheduleOrUpdateConsumers
in interface JobMasterGateway
partitionID
- The partition which has already produced datatimeout
- before the rpc call failspublic CompletableFuture<Acknowledge> disconnectTaskManager(ResourceID resourceID, Exception cause)
JobMasterGateway
TaskExecutor
from the
JobMaster
.disconnectTaskManager
in interface JobMasterGateway
resourceID
- identifying the TaskManager to disconnectcause
- for the disconnection of the TaskManagerpublic void acknowledgeCheckpoint(JobID jobID, ExecutionAttemptID executionAttemptID, long checkpointId, CheckpointMetrics checkpointMetrics, TaskStateSnapshot checkpointState)
acknowledgeCheckpoint
in interface CheckpointCoordinatorGateway
public void declineCheckpoint(DeclineCheckpoint decline)
declineCheckpoint
in interface CheckpointCoordinatorGateway
public CompletableFuture<KvStateLocation> requestKvStateLocation(JobID jobId, String registrationName)
KvStateLocationOracle
KvStateLocation
for the specified InternalKvState
registration name.requestKvStateLocation
in interface KvStateLocationOracle
jobId
- identifying the job for which to request the KvStateLocation
registrationName
- Name under which the KvState has been registered.InternalKvState
locationpublic CompletableFuture<Acknowledge> notifyKvStateRegistered(JobID jobId, JobVertexID jobVertexId, KeyGroupRange keyGroupRange, String registrationName, KvStateID kvStateId, InetSocketAddress kvStateServerAddress)
KvStateRegistryGateway
notifyKvStateRegistered
in interface KvStateRegistryGateway
jobId
- identifying the job for which to register a key value statejobVertexId
- JobVertexID the KvState instance belongs to.keyGroupRange
- Key group range the KvState instance belongs to.registrationName
- Name under which the KvState has been registered.kvStateId
- ID of the registered KvState instance.kvStateServerAddress
- Server address where to find the KvState instance.public CompletableFuture<Acknowledge> notifyKvStateUnregistered(JobID jobId, JobVertexID jobVertexId, KeyGroupRange keyGroupRange, String registrationName)
KvStateRegistryGateway
notifyKvStateUnregistered
in interface KvStateRegistryGateway
jobId
- identifying the job for which to unregister a key value statejobVertexId
- JobVertexID the KvState instance belongs to.keyGroupRange
- Key group index the KvState instance belongs to.registrationName
- Name under which the KvState has been registered.public CompletableFuture<Collection<SlotOffer>> offerSlots(ResourceID taskManagerId, Collection<SlotOffer> slots, Time timeout)
JobMasterGateway
offerSlots
in interface JobMasterGateway
taskManagerId
- identifying the task managerslots
- to offer to the job managertimeout
- for the rpc callpublic void failSlot(ResourceID taskManagerId, AllocationID allocationId, Exception cause)
JobMasterGateway
failSlot
in interface JobMasterGateway
taskManagerId
- identifying the task managerallocationId
- identifying the slot to failcause
- of the failingpublic CompletableFuture<RegistrationResponse> registerTaskManager(String taskManagerRpcAddress, TaskManagerLocation taskManagerLocation, Time timeout)
JobMasterGateway
registerTaskManager
in interface JobMasterGateway
taskManagerRpcAddress
- the rpc address of the task managertaskManagerLocation
- location of the task managertimeout
- for the rpc callpublic void disconnectResourceManager(ResourceManagerId resourceManagerId, Exception cause)
JobMasterGateway
disconnectResourceManager
in interface JobMasterGateway
resourceManagerId
- identifying the resource manager leader idcause
- of the disconnectpublic void heartbeatFromTaskManager(ResourceID resourceID, AccumulatorReport accumulatorReport)
JobMasterGateway
heartbeatFromTaskManager
in interface JobMasterGateway
resourceID
- unique id of the task manageraccumulatorReport
- report containing accumulator updatespublic void heartbeatFromResourceManager(ResourceID resourceID)
JobMasterGateway
heartbeatFromResourceManager
in interface JobMasterGateway
resourceID
- unique id of the resource managerpublic CompletableFuture<JobDetails> requestJobDetails(Time timeout)
JobMasterGateway
requestJobDetails
in interface JobMasterGateway
timeout
- for the rpc callpublic CompletableFuture<JobStatus> requestJobStatus(Time timeout)
JobMasterGateway
requestJobStatus
in interface JobMasterGateway
timeout
- for the rpc callpublic CompletableFuture<ArchivedExecutionGraph> requestJob(Time timeout)
JobMasterGateway
ArchivedExecutionGraph
of the executed job.requestJob
in interface JobMasterGateway
timeout
- for the rpc callArchivedExecutionGraph
of the executed jobpublic CompletableFuture<String> triggerSavepoint(@Nullable String targetDirectory, boolean cancelJob, Time timeout)
JobMasterGateway
triggerSavepoint
in interface JobMasterGateway
targetDirectory
- to which to write the savepoint data or null if the
default savepoint directory should be usedtimeout
- for the rpc callpublic CompletableFuture<OperatorBackPressureStatsResponse> requestOperatorBackPressureStats(JobVertexID jobVertexId)
JobMasterGateway
requestOperatorBackPressureStats
in interface JobMasterGateway
jobVertexId
- JobVertex for which the stats are requested.OperatorBackPressureStatsResponse
.public void notifyAllocationFailure(AllocationID allocationID, Exception cause)
JobMasterGateway
notifyAllocationFailure
in interface JobMasterGateway
allocationID
- the failed allocation id.cause
- the reason that the allocation failedpublic CompletableFuture<Object> updateGlobalAggregate(String aggregateName, Object aggregand, byte[] serializedAggregateFunction)
JobMasterGateway
updateGlobalAggregate
in interface JobMasterGateway
aggregateName
- The name of the aggregate to updateaggregand
- The value to add to the aggregateserializedAggregateFunction
- The function to apply to the current aggregate and aggregand to
obtain the new aggregate value, this should be of type AggregateFunction
public JobMasterGateway getGateway()
JobMasterService
JobMasterGateway
belonging to this service.getGateway
in interface JobMasterService
Copyright © 2014–2020 The Apache Software Foundation. All rights reserved.