public abstract class ResourceManager<WorkerType extends ResourceIDRetrievable> extends FencedRpcEndpoint<ResourceManagerId> implements ResourceManagerGateway, LeaderContender
It offers the following methods as part of its rpc interface to interact with him remotely:
registerJobManager(JobMasterId, ResourceID, String, JobID, Time)
registers a
JobMaster
at the resource manager
requestSlot(JobMasterId, SlotRequest, Time)
requests a slot from the resource
manager
RpcEndpoint.MainThreadExecutor
Modifier and Type | Field and Description |
---|---|
protected Executor |
ioExecutor |
static String |
RESOURCE_MANAGER_NAME |
log, rpcServer
Constructor and Description |
---|
ResourceManager(RpcService rpcService,
ResourceID resourceId,
HighAvailabilityServices highAvailabilityServices,
HeartbeatServices heartbeatServices,
SlotManager slotManager,
ResourceManagerPartitionTrackerFactory clusterPartitionTrackerFactory,
JobLeaderIdService jobLeaderIdService,
ClusterInformation clusterInformation,
FatalErrorHandler fatalErrorHandler,
ResourceManagerMetricGroup resourceManagerMetricGroup,
Time rpcTimeout,
Executor ioExecutor) |
Modifier and Type | Method and Description |
---|---|
void |
cancelSlotRequest(AllocationID allocationID)
Cancel the slot allocation requests from the resource manager.
|
protected CompletableFuture<Void> |
clearStateAsync()
This method can be overridden to add a (non-blocking) state clearing routine to the
ResourceManager that will be called when leadership is revoked.
|
protected void |
closeJobManagerConnection(JobID jobId,
Exception cause)
This method should be called by the framework once it detects that a currently registered job
manager has failed.
|
protected void |
closeTaskManagerConnection(ResourceID resourceID,
Exception cause)
This method should be called by the framework once it detects that a currently registered
task executor has failed.
|
CompletableFuture<Acknowledge> |
declareRequiredResources(JobMasterId jobMasterId,
ResourceRequirements resourceRequirements,
Time timeout)
Declares the absolute resource requirements for a job.
|
CompletableFuture<Acknowledge> |
deregisterApplication(ApplicationStatus finalStatus,
String diagnostics)
Cleanup application and shut down cluster.
|
void |
disconnectJobManager(JobID jobId,
JobStatus jobStatus,
Exception cause)
Disconnects a JobManager specified by the given resourceID from the
ResourceManager . |
void |
disconnectTaskManager(ResourceID resourceId,
Exception cause)
Disconnects a TaskManager specified by the given resourceID from the
ResourceManager . |
CompletableFuture<Integer> |
getNumberOfRegisteredTaskManagers()
Gets the currently registered number of TaskManagers.
|
protected int |
getNumberRequiredTaskManagers() |
protected Map<WorkerResourceSpec,Integer> |
getRequiredResources() |
void |
grantLeadership(UUID newLeaderSessionID)
Callback method when current resourceManager is granted leadership.
|
void |
handleError(Exception exception)
Handles error occurring in the leader election service.
|
void |
heartbeatFromJobManager(ResourceID resourceID)
Sends the heartbeat to resource manager from job manager
|
void |
heartbeatFromTaskManager(ResourceID resourceID,
TaskExecutorHeartbeatPayload heartbeatPayload)
Sends the heartbeat to resource manager from task manager
|
protected abstract void |
initialize()
Initializes the framework specific components.
|
protected abstract void |
internalDeregisterApplication(ApplicationStatus finalStatus,
String optionalDiagnostics)
The framework specific code to deregister the application.
|
protected void |
jobLeaderLostLeadership(JobID jobId,
JobMasterId oldJobMasterId) |
CompletableFuture<Map<IntermediateDataSetID,DataSetMetaInfo>> |
listDataSets()
Returns all datasets for which partitions are being tracked.
|
void |
notifySlotAvailable(InstanceID instanceID,
SlotID slotId,
AllocationID allocationId)
Sent by the TaskExecutor to notify the ResourceManager that a slot has become available.
|
protected void |
onFatalError(Throwable t)
Notifies the ResourceManager that a fatal error has occurred and it cannot proceed.
|
protected void |
onLeadership() |
void |
onStart()
User overridable callback which is called from
RpcEndpoint.internalCallOnStart() . |
CompletableFuture<Void> |
onStop()
User overridable callback which is called from
RpcEndpoint.internalCallOnStop() . |
protected void |
onWorkerRegistered(WorkerType worker) |
protected CompletableFuture<Void> |
prepareLeadershipAsync()
This method can be overridden to add a (non-blocking) initialization routine to the
ResourceManager that will be called when leadership is granted but before leadership is
confirmed.
|
CompletableFuture<RegistrationResponse> |
registerJobManager(JobMasterId jobMasterId,
ResourceID jobManagerResourceId,
String jobManagerAddress,
JobID jobId,
Time timeout)
Register a
JobMaster at the resource manager. |
CompletableFuture<RegistrationResponse> |
registerTaskExecutor(TaskExecutorRegistration taskExecutorRegistration,
Time timeout)
Register a
TaskExecutor at the resource manager. |
CompletableFuture<Void> |
releaseClusterPartitions(IntermediateDataSetID dataSetId)
Releases all partitions associated with the given dataset.
|
protected void |
releaseResource(InstanceID instanceId,
Exception cause) |
protected void |
removeJob(JobID jobId,
Exception cause) |
CompletableFuture<ResourceOverview> |
requestResourceOverview(Time timeout)
Requests the resource overview.
|
CompletableFuture<Acknowledge> |
requestSlot(JobMasterId jobMasterId,
SlotRequest slotRequest,
Time timeout)
Requests a slot from the resource manager.
|
CompletableFuture<TransientBlobKey> |
requestTaskManagerFileUploadByName(ResourceID taskManagerId,
String fileName,
Time timeout)
Request the file upload from the given
TaskExecutor to the cluster's BlobServer . |
CompletableFuture<TransientBlobKey> |
requestTaskManagerFileUploadByType(ResourceID taskManagerId,
FileType fileType,
Time timeout)
Request the file upload from the given
TaskExecutor to the cluster's BlobServer . |
CompletableFuture<TaskManagerInfo> |
requestTaskManagerInfo(ResourceID resourceId,
Time timeout)
Requests information about the given
TaskExecutor . |
CompletableFuture<Collection<TaskManagerInfo>> |
requestTaskManagerInfo(Time timeout)
Requests information about the registered
TaskExecutor . |
CompletableFuture<Collection<LogInfo>> |
requestTaskManagerLogList(ResourceID taskManagerId,
Time timeout)
Request log list from the given
TaskExecutor . |
CompletableFuture<Collection<Tuple2<ResourceID,String>>> |
requestTaskManagerMetricQueryServiceAddresses(Time timeout)
Requests the paths for the TaskManager's
MetricQueryService to query. |
CompletableFuture<ThreadDumpInfo> |
requestThreadDump(ResourceID taskManagerId,
Time timeout)
Requests the thread dump from the given
TaskExecutor . |
void |
revokeLeadership()
Callback method when current resourceManager loses leadership.
|
CompletableFuture<Acknowledge> |
sendSlotReport(ResourceID taskManagerResourceId,
InstanceID taskManagerRegistrationId,
SlotReport slotReport,
Time timeout)
Sends the given
SlotReport to the ResourceManager. |
protected void |
setFailUnfulfillableRequest(boolean failUnfulfillableRequest)
Set
SlotManager whether to fail unfulfillable slot requests. |
abstract boolean |
startNewWorker(WorkerResourceSpec workerResourceSpec)
Allocates a resource using the worker resource specification.
|
abstract boolean |
stopWorker(WorkerType worker)
Stops the given worker.
|
protected abstract void |
terminate()
Terminates the framework specific components.
|
protected abstract WorkerType |
workerStarted(ResourceID resourceID)
Callback when a worker was started.
|
callAsyncWithoutFencing, getFencingToken, getMainThreadExecutor, getUnfencedMainThreadExecutor, runAsyncWithoutFencing, setFencingToken
callAsync, closeAsync, getAddress, getEndpointId, getHostname, getRpcService, getSelfGateway, getTerminationFuture, internalCallOnStart, internalCallOnStop, isRunning, runAsync, scheduleRunAsync, scheduleRunAsync, start, stop, validateRunsInMainThread
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
getFencingToken
getAddress, getHostname
getDescription
close
public static final String RESOURCE_MANAGER_NAME
protected final Executor ioExecutor
public ResourceManager(RpcService rpcService, ResourceID resourceId, HighAvailabilityServices highAvailabilityServices, HeartbeatServices heartbeatServices, SlotManager slotManager, ResourceManagerPartitionTrackerFactory clusterPartitionTrackerFactory, JobLeaderIdService jobLeaderIdService, ClusterInformation clusterInformation, FatalErrorHandler fatalErrorHandler, ResourceManagerMetricGroup resourceManagerMetricGroup, Time rpcTimeout, Executor ioExecutor)
public final void onStart() throws Exception
RpcEndpoint
RpcEndpoint.internalCallOnStart()
.
This method is called when the RpcEndpoint is being started. The method is guaranteed to be executed in the main thread context and can be used to start the rpc endpoint in the context of the rpc endpoint's main thread.
IMPORTANT: This method should never be called directly by the user.
onStart
in class RpcEndpoint
Exception
- indicating that the rpc endpoint could not be started. If an exception
occurs, then the rpc endpoint will automatically terminate.public final CompletableFuture<Void> onStop()
RpcEndpoint
RpcEndpoint.internalCallOnStop()
.
This method is called when the RpcEndpoint is being shut down. The method is guaranteed to be executed in the main thread context and can be used to clean up internal state.
IMPORTANT: This method should never be called directly by the user.
onStop
in class RpcEndpoint
public CompletableFuture<RegistrationResponse> registerJobManager(JobMasterId jobMasterId, ResourceID jobManagerResourceId, String jobManagerAddress, JobID jobId, Time timeout)
ResourceManagerGateway
JobMaster
at the resource manager.registerJobManager
in interface ResourceManagerGateway
jobMasterId
- The fencing token for the JobMaster leaderjobManagerResourceId
- The resource ID of the JobMaster that registersjobManagerAddress
- The address of the JobMaster that registersjobId
- The Job ID of the JobMaster that registerstimeout
- Timeout for the future to completepublic CompletableFuture<RegistrationResponse> registerTaskExecutor(TaskExecutorRegistration taskExecutorRegistration, Time timeout)
ResourceManagerGateway
TaskExecutor
at the resource manager.registerTaskExecutor
in interface ResourceManagerGateway
taskExecutorRegistration
- the task executor registration.timeout
- The timeout for the response.public CompletableFuture<Acknowledge> sendSlotReport(ResourceID taskManagerResourceId, InstanceID taskManagerRegistrationId, SlotReport slotReport, Time timeout)
ResourceManagerGateway
SlotReport
to the ResourceManager.sendSlotReport
in interface ResourceManagerGateway
taskManagerRegistrationId
- id identifying the sending TaskManagerslotReport
- which is sent to the ResourceManagertimeout
- for the operationAcknowledge
once the slot report has been
received.protected void onWorkerRegistered(WorkerType worker)
public void heartbeatFromTaskManager(ResourceID resourceID, TaskExecutorHeartbeatPayload heartbeatPayload)
ResourceManagerGateway
heartbeatFromTaskManager
in interface ResourceManagerGateway
resourceID
- unique id of the task managerheartbeatPayload
- payload from the originating TaskManagerpublic void heartbeatFromJobManager(ResourceID resourceID)
ResourceManagerGateway
heartbeatFromJobManager
in interface ResourceManagerGateway
resourceID
- unique id of the job managerpublic void disconnectTaskManager(ResourceID resourceId, Exception cause)
ResourceManagerGateway
ResourceManager
.disconnectTaskManager
in interface ResourceManagerGateway
resourceId
- identifying the TaskManager to disconnectcause
- for the disconnection of the TaskManagerpublic void disconnectJobManager(JobID jobId, JobStatus jobStatus, Exception cause)
ResourceManagerGateway
ResourceManager
.disconnectJobManager
in interface ResourceManagerGateway
jobId
- JobID for which the JobManager was the leaderjobStatus
- status of the job at the time of disconnectioncause
- for the disconnection of the JobManagerpublic CompletableFuture<Acknowledge> requestSlot(JobMasterId jobMasterId, SlotRequest slotRequest, Time timeout)
ResourceManagerGateway
requestSlot
in interface ResourceManagerGateway
jobMasterId
- id of the JobMasterslotRequest
- The slot to requestpublic CompletableFuture<Acknowledge> declareRequiredResources(JobMasterId jobMasterId, ResourceRequirements resourceRequirements, Time timeout)
ResourceManagerGateway
declareRequiredResources
in interface ResourceManagerGateway
jobMasterId
- id of the JobMasterresourceRequirements
- resource requirementspublic void cancelSlotRequest(AllocationID allocationID)
ResourceManagerGateway
cancelSlotRequest
in interface ResourceManagerGateway
allocationID
- The slot to requestpublic void notifySlotAvailable(InstanceID instanceID, SlotID slotId, AllocationID allocationId)
ResourceManagerGateway
notifySlotAvailable
in interface ResourceManagerGateway
instanceID
- TaskExecutor's instance idslotId
- The SlotID of the freed slotallocationId
- to which the slot has been allocatedpublic CompletableFuture<Acknowledge> deregisterApplication(ApplicationStatus finalStatus, @Nullable String diagnostics)
deregisterApplication
in interface ResourceManagerGateway
finalStatus
- of the Flink applicationdiagnostics
- diagnostics message for the Flink application or null
public CompletableFuture<Integer> getNumberOfRegisteredTaskManagers()
ResourceManagerGateway
getNumberOfRegisteredTaskManagers
in interface ResourceManagerGateway
public CompletableFuture<Collection<TaskManagerInfo>> requestTaskManagerInfo(Time timeout)
ResourceManagerGateway
TaskExecutor
.requestTaskManagerInfo
in interface ResourceManagerGateway
timeout
- of the requestpublic CompletableFuture<TaskManagerInfo> requestTaskManagerInfo(ResourceID resourceId, Time timeout)
ResourceManagerGateway
TaskExecutor
.requestTaskManagerInfo
in interface ResourceManagerGateway
resourceId
- identifying the TaskExecutor for which to return informationtimeout
- of the requestpublic CompletableFuture<ResourceOverview> requestResourceOverview(Time timeout)
ResourceManagerGateway
requestResourceOverview
in interface ResourceManagerGateway
timeout
- of the requestpublic CompletableFuture<Collection<Tuple2<ResourceID,String>>> requestTaskManagerMetricQueryServiceAddresses(Time timeout)
ResourceManagerGateway
MetricQueryService
to query.requestTaskManagerMetricQueryServiceAddresses
in interface ResourceManagerGateway
timeout
- for the asynchronous operationpublic CompletableFuture<TransientBlobKey> requestTaskManagerFileUploadByType(ResourceID taskManagerId, FileType fileType, Time timeout)
ResourceManagerGateway
TaskExecutor
to the cluster's BlobServer
. The corresponding TransientBlobKey
is returned.requestTaskManagerFileUploadByType
in interface ResourceManagerGateway
taskManagerId
- identifying the TaskExecutor
to upload the specified filefileType
- type of the file to uploadtimeout
- for the asynchronous operationTransientBlobKey
after uploading the file
to the BlobServer
.public CompletableFuture<TransientBlobKey> requestTaskManagerFileUploadByName(ResourceID taskManagerId, String fileName, Time timeout)
ResourceManagerGateway
TaskExecutor
to the cluster's BlobServer
. The corresponding TransientBlobKey
is returned.requestTaskManagerFileUploadByName
in interface ResourceManagerGateway
taskManagerId
- identifying the TaskExecutor
to upload the specified filefileName
- name of the file to uploadtimeout
- for the asynchronous operationTransientBlobKey
after uploading the file
to the BlobServer
.public CompletableFuture<Collection<LogInfo>> requestTaskManagerLogList(ResourceID taskManagerId, Time timeout)
ResourceManagerGateway
TaskExecutor
.requestTaskManagerLogList
in interface ResourceManagerGateway
taskManagerId
- identifying the TaskExecutor
to get log list fromtimeout
- for the asynchronous operationpublic CompletableFuture<Void> releaseClusterPartitions(IntermediateDataSetID dataSetId)
ClusterPartitionManager
releaseClusterPartitions
in interface ClusterPartitionManager
dataSetId
- dataset for which all associated partitions should be releasedpublic CompletableFuture<Map<IntermediateDataSetID,DataSetMetaInfo>> listDataSets()
ClusterPartitionManager
listDataSets
in interface ClusterPartitionManager
public CompletableFuture<ThreadDumpInfo> requestThreadDump(ResourceID taskManagerId, Time timeout)
ResourceManagerGateway
TaskExecutor
.requestThreadDump
in interface ResourceManagerGateway
taskManagerId
- taskManagerId identifying the TaskExecutor
to get the thread
dump fromtimeout
- timeout of the asynchronous operationprotected void closeJobManagerConnection(JobID jobId, Exception cause)
jobId
- identifying the job whose leader shall be disconnected.cause
- The exception which cause the JobManager failed.protected void closeTaskManagerConnection(ResourceID resourceID, Exception cause)
resourceID
- Id of the TaskManager that has failed.cause
- The exception which cause the TaskManager failed.protected void jobLeaderLostLeadership(JobID jobId, JobMasterId oldJobMasterId)
protected void releaseResource(InstanceID instanceId, Exception cause)
protected void onFatalError(Throwable t)
t
- The exception describing the fatal errorpublic void grantLeadership(UUID newLeaderSessionID)
grantLeadership
in interface LeaderContender
newLeaderSessionID
- unique leadershipIDprotected void onLeadership()
public void revokeLeadership()
revokeLeadership
in interface LeaderContender
public void handleError(Exception exception)
handleError
in interface LeaderContender
exception
- Exception being thrown in the leader election serviceprotected abstract void initialize() throws ResourceManagerException
ResourceManagerException
- which occurs during initialization and causes the resource
manager to fail.protected abstract void terminate() throws Exception
Exception
- which occurs during termination.protected CompletableFuture<Void> prepareLeadershipAsync()
CompletableFuture
that completes when the computation is finished.protected CompletableFuture<Void> clearStateAsync()
CompletableFuture
that completes when the state clearing routine is
finished.protected abstract void internalDeregisterApplication(ApplicationStatus finalStatus, @Nullable String optionalDiagnostics) throws ResourceManagerException
This method also needs to make sure all pending containers that are not registered yet are returned.
finalStatus
- The application status to report.optionalDiagnostics
- A diagnostics message or null
.ResourceManagerException
- if the application could not be shut down.@VisibleForTesting public abstract boolean startNewWorker(WorkerResourceSpec workerResourceSpec)
workerResourceSpec
- workerResourceSpec specifies the size of the to be allocated
resourceprotected abstract WorkerType workerStarted(ResourceID resourceID)
resourceID
- The worker resource idpublic abstract boolean stopWorker(WorkerType worker)
worker
- The worker.protected void setFailUnfulfillableRequest(boolean failUnfulfillableRequest)
SlotManager
whether to fail unfulfillable slot requests.failUnfulfillableRequest
- whether to fail unfulfillable requestsprotected int getNumberRequiredTaskManagers()
protected Map<WorkerResourceSpec,Integer> getRequiredResources()
Copyright © 2014–2021 The Apache Software Foundation. All rights reserved.