public abstract class ResourceManager<WorkerType extends Serializable> extends FencedRpcEndpoint<ResourceManagerId> implements ResourceManagerGateway, LeaderContender
It offers the following methods as part of its rpc interface to interact with him remotely:
registerJobManager(JobMasterId, ResourceID, String, JobID, Time)
registers a JobMaster
at the resource managerrequestSlot(JobMasterId, SlotRequest, Time)
requests a slot from the resource managerRpcEndpoint.MainThreadExecutor
Modifier and Type | Field and Description |
---|---|
static String |
RESOURCE_MANAGER_NAME |
log, rpcServer
Constructor and Description |
---|
ResourceManager(RpcService rpcService,
String resourceManagerEndpointId,
ResourceID resourceId,
ResourceManagerConfiguration resourceManagerConfiguration,
HighAvailabilityServices highAvailabilityServices,
HeartbeatServices heartbeatServices,
SlotManager slotManager,
MetricRegistry metricRegistry,
JobLeaderIdService jobLeaderIdService,
FatalErrorHandler fatalErrorHandler) |
Modifier and Type | Method and Description |
---|---|
protected void |
closeJobManagerConnection(JobID jobId,
Exception cause)
This method should be called by the framework once it detects that a currently registered
job manager has failed.
|
protected void |
closeTaskManagerConnection(ResourceID resourceID,
Exception cause)
This method should be called by the framework once it detects that a currently registered
task executor has failed.
|
void |
disconnectJobManager(JobID jobId,
Exception cause)
Disconnects a JobManager specified by the given resourceID from the
ResourceManager . |
void |
disconnectTaskManager(ResourceID resourceId,
Exception cause)
Disconnects a TaskManager specified by the given resourceID from the
ResourceManager . |
CompletableFuture<Integer> |
getNumberOfRegisteredTaskManagers()
Gets the currently registered number of TaskManagers.
|
void |
grantLeadership(UUID newLeaderSessionID)
Callback method when current resourceManager is granted leadership.
|
void |
handleError(Exception exception)
Handles error occurring in the leader election service.
|
void |
heartbeatFromJobManager(ResourceID resourceID)
Sends the heartbeat to resource manager from job manager
|
void |
heartbeatFromTaskManager(ResourceID resourceID,
SlotReport slotReport)
Sends the heartbeat to resource manager from task manager
|
protected abstract void |
initialize()
Initializes the framework specific components.
|
protected void |
jobLeaderLostLeadership(JobID jobId,
JobMasterId oldJobMasterId) |
void |
notifySlotAvailable(InstanceID instanceID,
SlotID slotId,
AllocationID allocationId)
Sent by the TaskExecutor to notify the ResourceManager that a slot has become available.
|
protected void |
onFatalError(Throwable t)
Notifies the ResourceManager that a fatal error has occurred and it cannot proceed.
|
void |
postStop()
User overridable callback.
|
void |
registerInfoMessageListener(String address)
Registers an info message listener.
|
CompletableFuture<RegistrationResponse> |
registerJobManager(JobMasterId jobMasterId,
ResourceID jobManagerResourceId,
String jobManagerAddress,
JobID jobId,
Time timeout)
Register a
JobMaster at the resource manager. |
CompletableFuture<RegistrationResponse> |
registerTaskExecutor(String taskExecutorAddress,
ResourceID taskExecutorResourceId,
SlotReport slotReport,
Time timeout)
Register a
TaskExecutor at the resource manager. |
protected void |
releaseResource(InstanceID instanceId) |
protected void |
removeJob(JobID jobId) |
CompletableFuture<ResourceOverview> |
requestResourceOverview(Time timeout)
Requests the resource overview.
|
CompletableFuture<Acknowledge> |
requestSlot(JobMasterId jobMasterId,
SlotRequest slotRequest,
Time timeout)
Requests a slot from the resource manager.
|
CompletableFuture<Collection<Tuple2<ResourceID,String>>> |
requestTaskManagerMetricQueryServicePaths(Time timeout)
Requests the paths for the TaskManager's
MetricQueryService to query. |
void |
revokeLeadership()
Callback method when current resourceManager loses leadership.
|
void |
sendInfoMessage(String message) |
protected abstract void |
shutDownApplication(ApplicationStatus finalStatus,
String optionalDiagnostics)
The framework specific code for shutting down the application.
|
void |
shutDownCluster(ApplicationStatus finalStatus,
String optionalDiagnostics)
Cleanup application and shut down cluster.
|
void |
start()
Starts the rpc endpoint.
|
abstract void |
startNewWorker(ResourceProfile resourceProfile)
Allocates a resource using the resource profile.
|
abstract boolean |
stopWorker(ResourceID resourceID)
Stops the given worker.
|
void |
unRegisterInfoMessageListener(String address)
Unregisters an info message listener.
|
protected abstract WorkerType |
workerStarted(ResourceID resourceID)
Callback when a worker was started.
|
callAsyncWithoutFencing, getFencingToken, getMainThreadExecutor, runAsyncWithoutFencing, setFencingToken
callAsync, getAddress, getEndpointId, getHostname, getRpcService, getSelfGateway, getTerminationFuture, runAsync, scheduleRunAsync, scheduleRunAsync, shutDown, stop, validateRunsInMainThread
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
getFencingToken
getAddress, getHostname
getAddress
public static final String RESOURCE_MANAGER_NAME
public ResourceManager(RpcService rpcService, String resourceManagerEndpointId, ResourceID resourceId, ResourceManagerConfiguration resourceManagerConfiguration, HighAvailabilityServices highAvailabilityServices, HeartbeatServices heartbeatServices, SlotManager slotManager, MetricRegistry metricRegistry, JobLeaderIdService jobLeaderIdService, FatalErrorHandler fatalErrorHandler)
public void start() throws Exception
RpcEndpoint
start
in class RpcEndpoint
Exception
- indicating that something went wrong while starting the RPC endpointpublic void postStop() throws Exception
RpcEndpoint
This method is called when the RpcEndpoint is being shut down. The method is guaranteed to be executed in the main thread context and can be used to clean up internal state. IMPORTANT: This method should never be called directly by the user.
postStop
in class RpcEndpoint
Exception
- if an error occurs. The exception is returned as result of the termination future.public CompletableFuture<RegistrationResponse> registerJobManager(JobMasterId jobMasterId, ResourceID jobManagerResourceId, String jobManagerAddress, JobID jobId, Time timeout)
ResourceManagerGateway
JobMaster
at the resource manager.registerJobManager
in interface ResourceManagerGateway
jobMasterId
- The fencing token for the JobMaster leaderjobManagerResourceId
- The resource ID of the JobMaster that registersjobManagerAddress
- The address of the JobMaster that registersjobId
- The Job ID of the JobMaster that registerstimeout
- Timeout for the future to completepublic CompletableFuture<RegistrationResponse> registerTaskExecutor(String taskExecutorAddress, ResourceID taskExecutorResourceId, SlotReport slotReport, Time timeout)
ResourceManagerGateway
TaskExecutor
at the resource manager.registerTaskExecutor
in interface ResourceManagerGateway
taskExecutorAddress
- The address of the TaskExecutor that registerstaskExecutorResourceId
- The resource ID of the TaskExecutor that registersslotReport
- The slot report containing free and allocated task slotstimeout
- The timeout for the response.public void heartbeatFromTaskManager(ResourceID resourceID, SlotReport slotReport)
ResourceManagerGateway
heartbeatFromTaskManager
in interface ResourceManagerGateway
resourceID
- unique id of the task managerslotReport
- Current slot allocation on the originating TaskManagerpublic void heartbeatFromJobManager(ResourceID resourceID)
ResourceManagerGateway
heartbeatFromJobManager
in interface ResourceManagerGateway
resourceID
- unique id of the job managerpublic void disconnectTaskManager(ResourceID resourceId, Exception cause)
ResourceManagerGateway
ResourceManager
.disconnectTaskManager
in interface ResourceManagerGateway
resourceId
- identifying the TaskManager to disconnectcause
- for the disconnection of the TaskManagerpublic void disconnectJobManager(JobID jobId, Exception cause)
ResourceManagerGateway
ResourceManager
.disconnectJobManager
in interface ResourceManagerGateway
jobId
- JobID for which the JobManager was the leadercause
- for the disconnection of the JobManagerpublic CompletableFuture<Acknowledge> requestSlot(JobMasterId jobMasterId, SlotRequest slotRequest, Time timeout)
ResourceManagerGateway
requestSlot
in interface ResourceManagerGateway
jobMasterId
- id of the JobMasterslotRequest
- The slot to requestpublic void notifySlotAvailable(InstanceID instanceID, SlotID slotId, AllocationID allocationId)
ResourceManagerGateway
notifySlotAvailable
in interface ResourceManagerGateway
instanceID
- TaskExecutor's instance idslotId
- The SlotID of the freed slotallocationId
- to which the slot has been allocatedpublic void registerInfoMessageListener(String address)
registerInfoMessageListener
in interface ResourceManagerGateway
address
- address of infoMessage listener to register to this resource managerpublic void unRegisterInfoMessageListener(String address)
unRegisterInfoMessageListener
in interface ResourceManagerGateway
address
- of the info message listener to unregister from this resource managerpublic void shutDownCluster(ApplicationStatus finalStatus, String optionalDiagnostics)
shutDownCluster
in interface ResourceManagerGateway
finalStatus
- of the Flink applicationoptionalDiagnostics
- for the Flink applicationpublic CompletableFuture<Integer> getNumberOfRegisteredTaskManagers()
ResourceManagerGateway
getNumberOfRegisteredTaskManagers
in interface ResourceManagerGateway
public CompletableFuture<ResourceOverview> requestResourceOverview(Time timeout)
ResourceManagerGateway
requestResourceOverview
in interface ResourceManagerGateway
timeout
- of the requestpublic CompletableFuture<Collection<Tuple2<ResourceID,String>>> requestTaskManagerMetricQueryServicePaths(Time timeout)
ResourceManagerGateway
MetricQueryService
to query.requestTaskManagerMetricQueryServicePaths
in interface ResourceManagerGateway
timeout
- for the asynchronous operationprotected void closeJobManagerConnection(JobID jobId, Exception cause)
jobId
- identifying the job whose leader shall be disconnected.cause
- The exception which cause the JobManager failed.protected void closeTaskManagerConnection(ResourceID resourceID, Exception cause)
resourceID
- Id of the TaskManager that has failed.cause
- The exception which cause the TaskManager failed.protected void removeJob(JobID jobId)
protected void jobLeaderLostLeadership(JobID jobId, JobMasterId oldJobMasterId)
protected void releaseResource(InstanceID instanceId)
public void sendInfoMessage(String message)
protected void onFatalError(Throwable t)
t
- The exception describing the fatal errorpublic void grantLeadership(UUID newLeaderSessionID)
grantLeadership
in interface LeaderContender
newLeaderSessionID
- unique leadershipIDpublic void revokeLeadership()
revokeLeadership
in interface LeaderContender
public void handleError(Exception exception)
handleError
in interface LeaderContender
exception
- Exception being thrown in the leader election serviceprotected abstract void initialize() throws ResourceManagerException
ResourceManagerException
- which occurs during initialization and causes the resource manager to fail.protected abstract void shutDownApplication(ApplicationStatus finalStatus, String optionalDiagnostics) throws ResourceManagerException
This method also needs to make sure all pending containers that are not registered yet are returned.
finalStatus
- The application status to report.optionalDiagnostics
- An optional diagnostics message.ResourceManagerException
- if the application could not be shut down.@VisibleForTesting public abstract void startNewWorker(ResourceProfile resourceProfile)
resourceProfile
- The resource descriptionprotected abstract WorkerType workerStarted(ResourceID resourceID)
resourceID
- The worker resource idpublic abstract boolean stopWorker(ResourceID resourceID)
resourceID
- identifying the worker to be stoppedCopyright © 2014–2018 The Apache Software Foundation. All rights reserved.