public class JobMaster extends FencedRpcEndpoint<JobMasterId> implements JobMasterGateway, JobMasterService
JobGraph
.
It offers the following methods as part of its rpc interface to interact with the JobMaster remotely:
updateTaskExecutionState(org.apache.flink.runtime.taskmanager.TaskExecutionState)
updates the task execution state for given task
RpcEndpoint.MainThreadExecutor
Modifier and Type | Field and Description |
---|---|
static String |
JOB_MANAGER_NAME
Default names for Flink's distributed components.
|
log, rpcServer
Constructor and Description |
---|
JobMaster(RpcService rpcService,
JobMasterId jobMasterId,
JobMasterConfiguration jobMasterConfiguration,
ResourceID resourceId,
JobGraph jobGraph,
HighAvailabilityServices highAvailabilityService,
SlotPoolServiceSchedulerFactory slotPoolServiceSchedulerFactory,
JobManagerSharedServices jobManagerSharedServices,
HeartbeatServices heartbeatServices,
JobManagerJobMetricGroupFactory jobMetricGroupFactory,
OnCompletionActions jobCompletionActions,
FatalErrorHandler fatalErrorHandler,
ClassLoader userCodeLoader,
ShuffleMaster<?> shuffleMaster,
PartitionTrackerFactory partitionTrackerFactory,
ExecutionDeploymentTracker executionDeploymentTracker,
ExecutionDeploymentReconciler.Factory executionDeploymentReconcilerFactory,
BlocklistHandler.Factory blocklistHandlerFactory,
Collection<FailureEnricher> failureEnrichers,
long initializationTimestamp) |
Modifier and Type | Method and Description |
---|---|
void |
acknowledgeCheckpoint(JobID jobID,
ExecutionAttemptID executionAttemptID,
long checkpointId,
CheckpointMetrics checkpointMetrics,
SerializedValue<TaskStateSnapshot> checkpointState) |
CompletableFuture<Acknowledge> |
cancel(Time timeout)
Cancels the currently executed job.
|
void |
declineCheckpoint(DeclineCheckpoint decline) |
CompletableFuture<CoordinationResponse> |
deliverCoordinationRequestToCoordinator(OperatorID operatorId,
SerializedValue<CoordinationRequest> serializedRequest,
Time timeout)
Deliver a coordination request to a specified coordinator and return the response.
|
void |
disconnectResourceManager(ResourceManagerId resourceManagerId,
Exception cause)
Disconnects the resource manager from the job manager because of the given cause.
|
CompletableFuture<Acknowledge> |
disconnectTaskManager(ResourceID resourceID,
Exception cause)
Disconnects the given
TaskExecutor from the
JobMaster . |
void |
failSlot(ResourceID taskManagerId,
AllocationID allocationId,
Exception cause)
Fails the slot with the given allocation id and cause.
|
JobMasterGateway |
getGateway()
Get the
JobMasterGateway belonging to this service. |
CompletableFuture<Collection<PartitionWithMetrics>> |
getPartitionWithMetrics(Duration timeout,
Set<ResultPartitionID> expectedPartitions)
Get specified partitions and their metrics (identified by
expectedPartitions ), the
metrics include sizes of sub-partitions in a result partition. |
CompletableFuture<Void> |
heartbeatFromResourceManager(ResourceID resourceID)
Sends heartbeat request from the resource manager.
|
CompletableFuture<Void> |
heartbeatFromTaskManager(ResourceID resourceID,
TaskExecutorToJobManagerHeartbeatPayload payload)
Sends the heartbeat to job manager from task manager.
|
void |
notifyEndOfData(ExecutionAttemptID executionAttempt)
Notifies that the task has reached the end of data.
|
CompletableFuture<Acknowledge> |
notifyKvStateRegistered(JobID jobId,
JobVertexID jobVertexId,
KeyGroupRange keyGroupRange,
String registrationName,
KvStateID kvStateId,
InetSocketAddress kvStateServerAddress)
Notifies that queryable state has been registered.
|
CompletableFuture<Acknowledge> |
notifyKvStateUnregistered(JobID jobId,
JobVertexID jobVertexId,
KeyGroupRange keyGroupRange,
String registrationName)
Notifies that queryable state has been unregistered.
|
CompletableFuture<Acknowledge> |
notifyNewBlockedNodes(Collection<BlockedNode> newNodes)
Notify new blocked node records.
|
void |
notifyNotEnoughResourcesAvailable(Collection<ResourceRequirement> acquiredResources)
Notifies that not enough resources are available to fulfill the resource requirements of a
job.
|
CompletableFuture<Collection<SlotOffer>> |
offerSlots(ResourceID taskManagerId,
Collection<SlotOffer> slots,
Time timeout)
Offers the given slots to the job manager.
|
protected void |
onStart()
User overridable callback which is called from
RpcEndpoint.internalCallOnStart() . |
CompletableFuture<Void> |
onStop()
Suspend the job and shutdown all other services including rpc.
|
CompletableFuture<RegistrationResponse> |
registerTaskManager(JobID jobId,
TaskManagerRegistrationInformation taskManagerRegistrationInformation,
Time timeout)
Registers the task manager at the job manager.
|
void |
reportCheckpointMetrics(JobID jobID,
ExecutionAttemptID executionAttemptID,
long checkpointId,
CheckpointMetrics checkpointMetrics) |
void |
reportInitializationMetrics(JobID jobId,
SubTaskInitializationMetrics initializationMetrics) |
CompletableFuture<CheckpointStatsSnapshot> |
requestCheckpointStats(Time timeout)
Requests the
CheckpointStatsSnapshot of the job. |
CompletableFuture<ExecutionGraphInfo> |
requestJob(Time timeout)
Requests the
ExecutionGraphInfo of the executed job. |
CompletableFuture<JobResourceRequirements> |
requestJobResourceRequirements()
Read current
job resource requirements . |
CompletableFuture<JobStatus> |
requestJobStatus(Time timeout)
Requests the current job status.
|
CompletableFuture<KvStateLocation> |
requestKvStateLocation(JobID jobId,
String registrationName)
Requests a
KvStateLocation for the specified InternalKvState registration
name. |
CompletableFuture<SerializedInputSplit> |
requestNextInputSplit(JobVertexID vertexID,
ExecutionAttemptID executionAttempt)
Requests the next input split for the
ExecutionJobVertex . |
CompletableFuture<ExecutionState> |
requestPartitionState(IntermediateDataSetID intermediateResultId,
ResultPartitionID resultPartitionId)
Requests the current state of the partition.
|
CompletableFuture<Acknowledge> |
sendOperatorEventToCoordinator(ExecutionAttemptID task,
OperatorID operatorID,
SerializedValue<OperatorEvent> serializedEvent) |
CompletableFuture<CoordinationResponse> |
sendRequestToCoordinator(OperatorID operatorID,
SerializedValue<CoordinationRequest> serializedRequest) |
void |
startFetchAndRetainPartitionWithMetricsOnTaskManager()
Notify jobMaster to fetch and retain partitions on task managers.
|
CompletableFuture<?> |
stopTrackingAndReleasePartitions(Collection<ResultPartitionID> partitionIds)
Notifies the
JobMasterPartitionTracker
to stop tracking the target result partitions and release the locally occupied resources on
TaskExecutor s if any. |
CompletableFuture<String> |
stopWithSavepoint(String targetDirectory,
SavepointFormatType formatType,
boolean terminate,
Time timeout)
Stops the job with a savepoint.
|
CompletableFuture<CompletedCheckpoint> |
triggerCheckpoint(CheckpointType checkpointType,
Time timeout)
Triggers taking a checkpoint of the executed job.
|
CompletableFuture<String> |
triggerSavepoint(String targetDirectory,
boolean cancelJob,
SavepointFormatType formatType,
Time timeout)
Triggers taking a savepoint of the executed job.
|
CompletableFuture<Object> |
updateGlobalAggregate(String aggregateName,
Object aggregand,
byte[] serializedAggregateFunction)
Update the aggregate and return the new value.
|
CompletableFuture<Acknowledge> |
updateJobResourceRequirements(JobResourceRequirements jobResourceRequirements)
Update
job resource requirements . |
CompletableFuture<Acknowledge> |
updateTaskExecutionState(TaskExecutionState taskExecutionState)
Updates the task execution state for a given task.
|
getFencingToken
callAsync, closeAsync, getAddress, getEndpointId, getHostname, getMainThreadExecutor, getMainThreadExecutor, getRpcService, getSelfGateway, getTerminationFuture, internalCallOnStart, internalCallOnStop, isRunning, registerResource, runAsync, scheduleRunAsync, scheduleRunAsync, start, stop, unregisterResource, validateRunsInMainThread
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
triggerCheckpoint
getFencingToken
getAddress, getHostname
getAddress, getTerminationFuture
close, closeAsync
public static final String JOB_MANAGER_NAME
public JobMaster(RpcService rpcService, JobMasterId jobMasterId, JobMasterConfiguration jobMasterConfiguration, ResourceID resourceId, JobGraph jobGraph, HighAvailabilityServices highAvailabilityService, SlotPoolServiceSchedulerFactory slotPoolServiceSchedulerFactory, JobManagerSharedServices jobManagerSharedServices, HeartbeatServices heartbeatServices, JobManagerJobMetricGroupFactory jobMetricGroupFactory, OnCompletionActions jobCompletionActions, FatalErrorHandler fatalErrorHandler, ClassLoader userCodeLoader, ShuffleMaster<?> shuffleMaster, PartitionTrackerFactory partitionTrackerFactory, ExecutionDeploymentTracker executionDeploymentTracker, ExecutionDeploymentReconciler.Factory executionDeploymentReconcilerFactory, BlocklistHandler.Factory blocklistHandlerFactory, Collection<FailureEnricher> failureEnrichers, long initializationTimestamp) throws Exception
Exception
protected void onStart() throws JobMasterException
RpcEndpoint
RpcEndpoint.internalCallOnStart()
.
This method is called when the RpcEndpoint is being started. The method is guaranteed to be executed in the main thread context and can be used to start the rpc endpoint in the context of the rpc endpoint's main thread.
IMPORTANT: This method should never be called directly by the user.
onStart
in class RpcEndpoint
JobMasterException
public CompletableFuture<Void> onStop()
onStop
in class RpcEndpoint
public CompletableFuture<Acknowledge> cancel(Time timeout)
JobMasterGateway
cancel
in interface JobMasterGateway
timeout
- of this operationpublic CompletableFuture<Acknowledge> updateTaskExecutionState(TaskExecutionState taskExecutionState)
updateTaskExecutionState
in interface JobMasterGateway
taskExecutionState
- New task execution state for a given taskpublic void notifyEndOfData(ExecutionAttemptID executionAttempt)
JobMasterGateway
notifyEndOfData
in interface JobMasterGateway
executionAttempt
- The execution attempt id.public CompletableFuture<SerializedInputSplit> requestNextInputSplit(JobVertexID vertexID, ExecutionAttemptID executionAttempt)
JobMasterGateway
ExecutionJobVertex
. The next input split is
sent back to the sender as a SerializedInputSplit
message.requestNextInputSplit
in interface JobMasterGateway
vertexID
- The job vertex idexecutionAttempt
- The execution attempt idpublic CompletableFuture<ExecutionState> requestPartitionState(IntermediateDataSetID intermediateResultId, ResultPartitionID resultPartitionId)
JobMasterGateway
requestPartitionState
in interface JobMasterGateway
intermediateResultId
- The execution attempt ID of the task requesting the partition
state.resultPartitionId
- The partition ID of the partition to request the state of.public CompletableFuture<Acknowledge> disconnectTaskManager(ResourceID resourceID, Exception cause)
JobMasterGateway
TaskExecutor
from the
JobMaster
.disconnectTaskManager
in interface JobMasterGateway
resourceID
- identifying the TaskManager to disconnectcause
- for the disconnection of the TaskManagerpublic void acknowledgeCheckpoint(JobID jobID, ExecutionAttemptID executionAttemptID, long checkpointId, CheckpointMetrics checkpointMetrics, @Nullable SerializedValue<TaskStateSnapshot> checkpointState)
acknowledgeCheckpoint
in interface CheckpointCoordinatorGateway
public void reportCheckpointMetrics(JobID jobID, ExecutionAttemptID executionAttemptID, long checkpointId, CheckpointMetrics checkpointMetrics)
reportCheckpointMetrics
in interface CheckpointCoordinatorGateway
public void reportInitializationMetrics(JobID jobId, SubTaskInitializationMetrics initializationMetrics)
reportInitializationMetrics
in interface CheckpointCoordinatorGateway
public void declineCheckpoint(DeclineCheckpoint decline)
declineCheckpoint
in interface CheckpointCoordinatorGateway
public CompletableFuture<Acknowledge> sendOperatorEventToCoordinator(ExecutionAttemptID task, OperatorID operatorID, SerializedValue<OperatorEvent> serializedEvent)
sendOperatorEventToCoordinator
in interface JobMasterOperatorEventGateway
public CompletableFuture<CoordinationResponse> sendRequestToCoordinator(OperatorID operatorID, SerializedValue<CoordinationRequest> serializedRequest)
sendRequestToCoordinator
in interface JobMasterOperatorEventGateway
public CompletableFuture<KvStateLocation> requestKvStateLocation(JobID jobId, String registrationName)
KvStateLocationOracle
KvStateLocation
for the specified InternalKvState
registration
name.requestKvStateLocation
in interface KvStateLocationOracle
jobId
- identifying the job for which to request the KvStateLocation
registrationName
- Name under which the KvState has been registered.InternalKvState
locationpublic CompletableFuture<Acknowledge> notifyKvStateRegistered(JobID jobId, JobVertexID jobVertexId, KeyGroupRange keyGroupRange, String registrationName, KvStateID kvStateId, InetSocketAddress kvStateServerAddress)
KvStateRegistryGateway
notifyKvStateRegistered
in interface KvStateRegistryGateway
jobId
- identifying the job for which to register a key value statejobVertexId
- JobVertexID the KvState instance belongs to.keyGroupRange
- Key group range the KvState instance belongs to.registrationName
- Name under which the KvState has been registered.kvStateId
- ID of the registered KvState instance.kvStateServerAddress
- Server address where to find the KvState instance.public CompletableFuture<Acknowledge> notifyKvStateUnregistered(JobID jobId, JobVertexID jobVertexId, KeyGroupRange keyGroupRange, String registrationName)
KvStateRegistryGateway
notifyKvStateUnregistered
in interface KvStateRegistryGateway
jobId
- identifying the job for which to unregister a key value statejobVertexId
- JobVertexID the KvState instance belongs to.keyGroupRange
- Key group index the KvState instance belongs to.registrationName
- Name under which the KvState has been registered.public CompletableFuture<Collection<SlotOffer>> offerSlots(ResourceID taskManagerId, Collection<SlotOffer> slots, Time timeout)
JobMasterGateway
offerSlots
in interface JobMasterGateway
taskManagerId
- identifying the task managerslots
- to offer to the job managertimeout
- for the rpc callpublic void failSlot(ResourceID taskManagerId, AllocationID allocationId, Exception cause)
JobMasterGateway
failSlot
in interface JobMasterGateway
taskManagerId
- identifying the task managerallocationId
- identifying the slot to failcause
- of the failingpublic CompletableFuture<RegistrationResponse> registerTaskManager(JobID jobId, TaskManagerRegistrationInformation taskManagerRegistrationInformation, Time timeout)
JobMasterGateway
registerTaskManager
in interface JobMasterGateway
jobId
- jobId specifying the job for which the JobMaster should be responsibletaskManagerRegistrationInformation
- the information for registering a task manager at
the job managertimeout
- for the rpc callpublic void disconnectResourceManager(ResourceManagerId resourceManagerId, Exception cause)
JobMasterGateway
disconnectResourceManager
in interface JobMasterGateway
resourceManagerId
- identifying the resource manager leader idcause
- of the disconnectpublic CompletableFuture<Void> heartbeatFromTaskManager(ResourceID resourceID, TaskExecutorToJobManagerHeartbeatPayload payload)
JobMasterGateway
heartbeatFromTaskManager
in interface JobMasterGateway
resourceID
- unique id of the task managerpayload
- report payloadpublic CompletableFuture<Void> heartbeatFromResourceManager(ResourceID resourceID)
JobMasterGateway
heartbeatFromResourceManager
in interface JobMasterGateway
resourceID
- unique id of the resource managerpublic CompletableFuture<JobStatus> requestJobStatus(Time timeout)
JobMasterGateway
requestJobStatus
in interface JobMasterGateway
timeout
- for the rpc callpublic CompletableFuture<ExecutionGraphInfo> requestJob(Time timeout)
JobMasterGateway
ExecutionGraphInfo
of the executed job.requestJob
in interface JobMasterGateway
timeout
- for the rpc callExecutionGraphInfo
of the executed jobpublic CompletableFuture<CheckpointStatsSnapshot> requestCheckpointStats(Time timeout)
JobMasterGateway
CheckpointStatsSnapshot
of the job.requestCheckpointStats
in interface JobMasterGateway
timeout
- for the rpc callCheckpointStatsSnapshot
of the jobpublic CompletableFuture<CompletedCheckpoint> triggerCheckpoint(CheckpointType checkpointType, Time timeout)
JobMasterGateway
triggerCheckpoint
in interface JobMasterGateway
checkpointType
- to determine how checkpoint should be takentimeout
- for the rpc callpublic CompletableFuture<String> triggerSavepoint(@Nullable String targetDirectory, boolean cancelJob, SavepointFormatType formatType, Time timeout)
JobMasterGateway
triggerSavepoint
in interface JobMasterGateway
targetDirectory
- to which to write the savepoint data or null if the default savepoint
directory should be usedformatType
- binary format for the savepointtimeout
- for the rpc callpublic CompletableFuture<String> stopWithSavepoint(@Nullable String targetDirectory, SavepointFormatType formatType, boolean terminate, Time timeout)
JobMasterGateway
stopWithSavepoint
in interface JobMasterGateway
targetDirectory
- to which to write the savepoint data or null if the default savepoint
directory should be usedterminate
- flag indicating if the job should terminate or just suspendtimeout
- for the rpc callpublic void notifyNotEnoughResourcesAvailable(Collection<ResourceRequirement> acquiredResources)
JobMasterGateway
notifyNotEnoughResourcesAvailable
in interface JobMasterGateway
acquiredResources
- the resources that have been acquired for the jobpublic CompletableFuture<Object> updateGlobalAggregate(String aggregateName, Object aggregand, byte[] serializedAggregateFunction)
JobMasterGateway
updateGlobalAggregate
in interface JobMasterGateway
aggregateName
- The name of the aggregate to updateaggregand
- The value to add to the aggregateserializedAggregateFunction
- The function to apply to the current aggregate and
aggregand to obtain the new aggregate value, this should be of type AggregateFunction
public CompletableFuture<CoordinationResponse> deliverCoordinationRequestToCoordinator(OperatorID operatorId, SerializedValue<CoordinationRequest> serializedRequest, Time timeout)
JobMasterGateway
deliverCoordinationRequestToCoordinator
in interface JobMasterGateway
operatorId
- identifying the coordinator to receive the requestserializedRequest
- serialized request to deliverFlinkException
if the task is not running, or no
operator/coordinator exists for the given ID, or the coordinator cannot handle client
events.public CompletableFuture<?> stopTrackingAndReleasePartitions(Collection<ResultPartitionID> partitionIds)
JobMasterGateway
JobMasterPartitionTracker
to stop tracking the target result partitions and release the locally occupied resources on
TaskExecutor
s if any.stopTrackingAndReleasePartitions
in interface JobMasterGateway
public CompletableFuture<Collection<PartitionWithMetrics>> getPartitionWithMetrics(Duration timeout, Set<ResultPartitionID> expectedPartitions)
JobMasterGateway
expectedPartitions
), the
metrics include sizes of sub-partitions in a result partition.getPartitionWithMetrics
in interface JobMasterGateway
timeout
- The timeout used for retrieve the specified partitions.expectedPartitions
- The set of identifiers for the result partitions whose metrics are
to be fetched.public void startFetchAndRetainPartitionWithMetricsOnTaskManager()
JobMasterGateway
startFetchAndRetainPartitionWithMetricsOnTaskManager
in interface JobMasterGateway
public CompletableFuture<Acknowledge> notifyNewBlockedNodes(Collection<BlockedNode> newNodes)
BlocklistListener
notifyNewBlockedNodes
in interface BlocklistListener
newNodes
- the new blocked node recordspublic CompletableFuture<JobResourceRequirements> requestJobResourceRequirements()
JobMasterGateway
job resource requirements
.requestJobResourceRequirements
in interface JobMasterGateway
public CompletableFuture<Acknowledge> updateJobResourceRequirements(JobResourceRequirements jobResourceRequirements)
JobMasterGateway
job resource requirements
.updateJobResourceRequirements
in interface JobMasterGateway
jobResourceRequirements
- new resource requirementspublic JobMasterGateway getGateway()
JobMasterService
JobMasterGateway
belonging to this service.getGateway
in interface JobMasterService
Copyright © 2014–2024 The Apache Software Foundation. All rights reserved.