Class ResourceManager<WorkerType extends ResourceIDRetrievable>
- java.lang.Object
-
- org.apache.flink.runtime.rpc.RpcEndpoint
-
- org.apache.flink.runtime.rpc.FencedRpcEndpoint<ResourceManagerId>
-
- org.apache.flink.runtime.resourcemanager.ResourceManager<WorkerType>
-
- All Implemented Interfaces:
AutoCloseable
,BlocklistListener
,ClusterPartitionManager
,ResourceManagerGateway
,FencedRpcGateway<ResourceManagerId>
,RpcGateway
,DelegationTokenManager.Listener
,AutoCloseableAsync
- Direct Known Subclasses:
ActiveResourceManager
,StandaloneResourceManager
public abstract class ResourceManager<WorkerType extends ResourceIDRetrievable> extends FencedRpcEndpoint<ResourceManagerId> implements DelegationTokenManager.Listener, ResourceManagerGateway
ResourceManager implementation. The resource manager is responsible for resource de-/allocation and bookkeeping.It offers the following methods as part of its rpc interface to interact with him remotely:
registerJobMaster(JobMasterId, ResourceID, String, JobID, Duration)
registers aJobMaster
at the resource manager
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class org.apache.flink.runtime.rpc.RpcEndpoint
RpcEndpoint.MainThreadExecutor
-
-
Field Summary
Fields Modifier and Type Field Description protected BlocklistHandler
blocklistHandler
protected Executor
ioExecutor
static String
RESOURCE_MANAGER_NAME
protected ResourceManagerMetricGroup
resourceManagerMetricGroup
-
Fields inherited from class org.apache.flink.runtime.rpc.RpcEndpoint
log, rpcServer
-
-
Constructor Summary
Constructors Constructor Description ResourceManager(RpcService rpcService, UUID leaderSessionId, ResourceID resourceId, HeartbeatServices heartbeatServices, DelegationTokenManager delegationTokenManager, SlotManager slotManager, ResourceManagerPartitionTrackerFactory clusterPartitionTrackerFactory, BlocklistHandler.Factory blocklistHandlerFactory, JobLeaderIdService jobLeaderIdService, ClusterInformation clusterInformation, FatalErrorHandler fatalErrorHandler, ResourceManagerMetricGroup resourceManagerMetricGroup, Duration rpcTimeout, Executor ioExecutor)
-
Method Summary
All Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description protected void
closeJobManagerConnection(JobID jobId, org.apache.flink.runtime.resourcemanager.ResourceManager.ResourceRequirementHandling resourceRequirementHandling, Exception cause)
This method should be called by the framework once it detects that a currently registered job manager has failed.protected Optional<WorkerType>
closeTaskManagerConnection(ResourceID resourceID, Exception cause)
This method should be called by the framework once it detects that a currently registered task executor has failed.CompletableFuture<Acknowledge>
declareRequiredResources(JobMasterId jobMasterId, ResourceRequirements resourceRequirements, Duration timeout)
Declares the absolute resource requirements for a job.CompletableFuture<Acknowledge>
deregisterApplication(ApplicationStatus finalStatus, String diagnostics)
Cleanup application and shut down cluster.void
disconnectJobManager(JobID jobId, JobStatus jobStatus, Exception cause)
Disconnects a JobManager specified by the given resourceID from theResourceManager
.void
disconnectTaskManager(ResourceID resourceId, Exception cause)
Disconnects a TaskManager specified by the given resourceID from theResourceManager
.CompletableFuture<List<ShuffleDescriptor>>
getClusterPartitionsShuffleDescriptors(IntermediateDataSetID intermediateDataSetID)
Get the shuffle descriptors of the cluster partitions ordered by partition number.Optional<InstanceID>
getInstanceIdByResourceId(ResourceID resourceID)
CompletableFuture<Integer>
getNumberOfRegisteredTaskManagers()
Gets the currently registered number of TaskManagers.protected abstract CompletableFuture<Void>
getReadyToServeFuture()
Get the ready to serve future of the resource manager.protected abstract ResourceAllocator
getResourceAllocator()
CompletableFuture<Void>
getStartedFuture()
Completion of this future indicates that the resource manager is fully started and is ready to serve.protected WorkerType
getWorkerByInstanceId(InstanceID instanceId)
protected abstract Optional<WorkerType>
getWorkerNodeIfAcceptRegistration(ResourceID resourceID)
Get worker node if the worker resource is accepted.CompletableFuture<Void>
heartbeatFromJobManager(ResourceID resourceID)
Sends the heartbeat to resource manager from job manager.CompletableFuture<Void>
heartbeatFromTaskManager(ResourceID resourceID, TaskExecutorHeartbeatPayload heartbeatPayload)
Sends the heartbeat to resource manager from task manager.protected abstract void
initialize()
Initializes the framework specific components.protected abstract void
internalDeregisterApplication(ApplicationStatus finalStatus, String optionalDiagnostics)
The framework specific code to deregister the application.protected void
jobLeaderLostLeadership(JobID jobId, JobMasterId oldJobMasterId)
CompletableFuture<Map<IntermediateDataSetID,DataSetMetaInfo>>
listDataSets()
Returns all datasets for which partitions are being tracked.CompletableFuture<Acknowledge>
notifyNewBlockedNodes(Collection<BlockedNode> newNodes)
Notify new blocked node records.void
notifySlotAvailable(InstanceID instanceID, SlotID slotId, AllocationID allocationId)
Sent by the TaskExecutor to notify the ResourceManager that a slot has become available.protected void
onFatalError(Throwable t)
Notifies the ResourceManager that a fatal error has occurred and it cannot proceed.void
onNewTokensObtained(byte[] tokens)
Callback function when new delegation tokens obtained.void
onStart()
User overridable callback which is called fromRpcEndpoint.internalCallOnStart()
.CompletableFuture<Void>
onStop()
User overridable callback which is called fromRpcEndpoint.internalCallOnStop()
.protected void
onWorkerRegistered(WorkerType worker, WorkerResourceSpec workerResourceSpec)
CompletableFuture<RegistrationResponse>
registerJobMaster(JobMasterId jobMasterId, ResourceID jobManagerResourceId, String jobManagerAddress, JobID jobId, Duration timeout)
Register aJobMaster
at the resource manager.protected void
registerMetrics()
CompletableFuture<RegistrationResponse>
registerTaskExecutor(TaskExecutorRegistration taskExecutorRegistration, Duration timeout)
Register aTaskExecutor
at the resource manager.CompletableFuture<Void>
releaseClusterPartitions(IntermediateDataSetID dataSetId)
Releases all partitions associated with the given dataset.protected void
removeJob(JobID jobId, Exception cause)
CompletableFuture<Void>
reportClusterPartitions(ResourceID taskExecutorId, ClusterPartitionReport clusterPartitionReport)
Report the cluster partitions status in the task executor.CompletableFuture<ProfilingInfo>
requestProfiling(ResourceID taskManagerId, int duration, ProfilingInfo.ProfilingMode mode, Duration timeout)
Requests the profiling instance from the givenTaskExecutor
.CompletableFuture<ResourceOverview>
requestResourceOverview(Duration timeout)
Requests the resource overview.CompletableFuture<TaskExecutorThreadInfoGateway>
requestTaskExecutorThreadInfoGateway(ResourceID taskManagerId, Duration timeout)
Requests theTaskExecutorGateway
.CompletableFuture<TaskManagerInfoWithSlots>
requestTaskManagerDetailsInfo(ResourceID resourceId, Duration timeout)
Requests detail information about the givenTaskExecutor
.CompletableFuture<TransientBlobKey>
requestTaskManagerFileUploadByNameAndType(ResourceID taskManagerId, String fileName, FileType fileType, Duration timeout)
Request the file upload from the givenTaskExecutor
to the cluster'sBlobServer
.CompletableFuture<TransientBlobKey>
requestTaskManagerFileUploadByType(ResourceID taskManagerId, FileType fileType, Duration timeout)
Request the file upload from the givenTaskExecutor
to the cluster'sBlobServer
.CompletableFuture<Collection<TaskManagerInfo>>
requestTaskManagerInfo(Duration timeout)
Requests information about the registeredTaskExecutor
.CompletableFuture<Collection<LogInfo>>
requestTaskManagerLogList(ResourceID taskManagerId, Duration timeout)
Request log list from the givenTaskExecutor
.CompletableFuture<Collection<Tuple2<ResourceID,String>>>
requestTaskManagerMetricQueryServiceAddresses(Duration timeout)
Requests the paths for the TaskManager'sMetricQueryService
to query.CompletableFuture<Collection<ProfilingInfo>>
requestTaskManagerProfilingList(ResourceID taskManagerId, Duration timeout)
Request profiling list from the givenTaskExecutor
.CompletableFuture<ThreadDumpInfo>
requestThreadDump(ResourceID taskManagerId, Duration timeout)
Requests the thread dump from the givenTaskExecutor
.CompletableFuture<Acknowledge>
sendSlotReport(ResourceID taskManagerResourceId, InstanceID taskManagerRegistrationId, SlotReport slotReport, Duration timeout)
Sends the givenSlotReport
to the ResourceManager.protected void
setFailUnfulfillableRequest(boolean failUnfulfillableRequest)
SetSlotManager
whether to fail unfulfillable slot requests.void
stopWorkerIfSupported(WorkerType worker)
Stops the given worker if supported.protected abstract void
terminate()
Terminates the framework specific components.-
Methods inherited from class org.apache.flink.runtime.rpc.FencedRpcEndpoint
getFencingToken
-
Methods inherited from class org.apache.flink.runtime.rpc.RpcEndpoint
callAsync, closeAsync, getAddress, getEndpointId, getHostname, getMainThreadExecutor, getMainThreadExecutor, getRpcService, getSelfGateway, getTerminationFuture, internalCallOnStart, internalCallOnStop, isRunning, registerResource, runAsync, scheduleRunAsync, scheduleRunAsync, start, stop, unregisterResource, validateRunsInMainThread
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface org.apache.flink.util.AutoCloseableAsync
close
-
Methods inherited from interface org.apache.flink.runtime.rpc.FencedRpcGateway
getFencingToken
-
Methods inherited from interface org.apache.flink.runtime.rpc.RpcGateway
getAddress, getHostname
-
-
-
-
Field Detail
-
RESOURCE_MANAGER_NAME
public static final String RESOURCE_MANAGER_NAME
- See Also:
- Constant Field Values
-
resourceManagerMetricGroup
protected final ResourceManagerMetricGroup resourceManagerMetricGroup
-
ioExecutor
protected final Executor ioExecutor
-
blocklistHandler
protected final BlocklistHandler blocklistHandler
-
-
Constructor Detail
-
ResourceManager
public ResourceManager(RpcService rpcService, UUID leaderSessionId, ResourceID resourceId, HeartbeatServices heartbeatServices, DelegationTokenManager delegationTokenManager, SlotManager slotManager, ResourceManagerPartitionTrackerFactory clusterPartitionTrackerFactory, BlocklistHandler.Factory blocklistHandlerFactory, JobLeaderIdService jobLeaderIdService, ClusterInformation clusterInformation, FatalErrorHandler fatalErrorHandler, ResourceManagerMetricGroup resourceManagerMetricGroup, Duration rpcTimeout, Executor ioExecutor)
-
-
Method Detail
-
onStart
public final void onStart() throws Exception
Description copied from class:RpcEndpoint
User overridable callback which is called fromRpcEndpoint.internalCallOnStart()
.This method is called when the RpcEndpoint is being started. The method is guaranteed to be executed in the main thread context and can be used to start the rpc endpoint in the context of the rpc endpoint's main thread.
IMPORTANT: This method should never be called directly by the user.
- Overrides:
onStart
in classRpcEndpoint
- Throws:
Exception
- indicating that the rpc endpoint could not be started. If an exception occurs, then the rpc endpoint will automatically terminate.
-
getStartedFuture
public CompletableFuture<Void> getStartedFuture()
Completion of this future indicates that the resource manager is fully started and is ready to serve.
-
onStop
public final CompletableFuture<Void> onStop()
Description copied from class:RpcEndpoint
User overridable callback which is called fromRpcEndpoint.internalCallOnStop()
.This method is called when the RpcEndpoint is being shut down. The method is guaranteed to be executed in the main thread context and can be used to clean up internal state.
IMPORTANT: This method should never be called directly by the user.
- Overrides:
onStop
in classRpcEndpoint
- Returns:
- Future which is completed once all post stop actions are completed. If an error occurs this future is completed exceptionally
-
registerJobMaster
public CompletableFuture<RegistrationResponse> registerJobMaster(JobMasterId jobMasterId, ResourceID jobManagerResourceId, String jobManagerAddress, JobID jobId, Duration timeout)
Description copied from interface:ResourceManagerGateway
Register aJobMaster
at the resource manager.- Specified by:
registerJobMaster
in interfaceResourceManagerGateway
- Parameters:
jobMasterId
- The fencing token for the JobMaster leaderjobManagerResourceId
- The resource ID of the JobMaster that registersjobManagerAddress
- The address of the JobMaster that registersjobId
- The Job ID of the JobMaster that registerstimeout
- Timeout for the future to complete- Returns:
- Future registration response
-
registerTaskExecutor
public CompletableFuture<RegistrationResponse> registerTaskExecutor(TaskExecutorRegistration taskExecutorRegistration, Duration timeout)
Description copied from interface:ResourceManagerGateway
Register aTaskExecutor
at the resource manager.- Specified by:
registerTaskExecutor
in interfaceResourceManagerGateway
- Parameters:
taskExecutorRegistration
- the task executor registration.timeout
- The timeout for the response.- Returns:
- The future to the response by the ResourceManager.
-
sendSlotReport
public CompletableFuture<Acknowledge> sendSlotReport(ResourceID taskManagerResourceId, InstanceID taskManagerRegistrationId, SlotReport slotReport, Duration timeout)
Description copied from interface:ResourceManagerGateway
Sends the givenSlotReport
to the ResourceManager.- Specified by:
sendSlotReport
in interfaceResourceManagerGateway
- Parameters:
taskManagerResourceId
- The resource ID of the sending TaskManagertaskManagerRegistrationId
- id identifying the sending TaskManagerslotReport
- which is sent to the ResourceManagertimeout
- for the operation- Returns:
- Future which is completed with
Acknowledge
once the slot report has been received.
-
onWorkerRegistered
protected void onWorkerRegistered(WorkerType worker, WorkerResourceSpec workerResourceSpec)
-
heartbeatFromTaskManager
public CompletableFuture<Void> heartbeatFromTaskManager(ResourceID resourceID, TaskExecutorHeartbeatPayload heartbeatPayload)
Description copied from interface:ResourceManagerGateway
Sends the heartbeat to resource manager from task manager.- Specified by:
heartbeatFromTaskManager
in interfaceResourceManagerGateway
- Parameters:
resourceID
- unique id of the task managerheartbeatPayload
- payload from the originating TaskManager- Returns:
- future which is completed exceptionally if the operation fails
-
heartbeatFromJobManager
public CompletableFuture<Void> heartbeatFromJobManager(ResourceID resourceID)
Description copied from interface:ResourceManagerGateway
Sends the heartbeat to resource manager from job manager.- Specified by:
heartbeatFromJobManager
in interfaceResourceManagerGateway
- Parameters:
resourceID
- unique id of the job manager- Returns:
- future which is completed exceptionally if the operation fails
-
disconnectTaskManager
public void disconnectTaskManager(ResourceID resourceId, Exception cause)
Description copied from interface:ResourceManagerGateway
Disconnects a TaskManager specified by the given resourceID from theResourceManager
.- Specified by:
disconnectTaskManager
in interfaceResourceManagerGateway
- Parameters:
resourceId
- identifying the TaskManager to disconnectcause
- for the disconnection of the TaskManager
-
disconnectJobManager
public void disconnectJobManager(JobID jobId, JobStatus jobStatus, Exception cause)
Description copied from interface:ResourceManagerGateway
Disconnects a JobManager specified by the given resourceID from theResourceManager
.- Specified by:
disconnectJobManager
in interfaceResourceManagerGateway
- Parameters:
jobId
- JobID for which the JobManager was the leaderjobStatus
- status of the job at the time of disconnectioncause
- for the disconnection of the JobManager
-
declareRequiredResources
public CompletableFuture<Acknowledge> declareRequiredResources(JobMasterId jobMasterId, ResourceRequirements resourceRequirements, Duration timeout)
Description copied from interface:ResourceManagerGateway
Declares the absolute resource requirements for a job.- Specified by:
declareRequiredResources
in interfaceResourceManagerGateway
- Parameters:
jobMasterId
- id of the JobMasterresourceRequirements
- resource requirements- Returns:
- The confirmation that the requirements have been processed
-
notifySlotAvailable
public void notifySlotAvailable(InstanceID instanceID, SlotID slotId, AllocationID allocationId)
Description copied from interface:ResourceManagerGateway
Sent by the TaskExecutor to notify the ResourceManager that a slot has become available.- Specified by:
notifySlotAvailable
in interfaceResourceManagerGateway
- Parameters:
instanceID
- TaskExecutor's instance idslotId
- The SlotID of the freed slotallocationId
- to which the slot has been allocated
-
deregisterApplication
public CompletableFuture<Acknowledge> deregisterApplication(ApplicationStatus finalStatus, @Nullable String diagnostics)
Cleanup application and shut down cluster.- Specified by:
deregisterApplication
in interfaceResourceManagerGateway
- Parameters:
finalStatus
- of the Flink applicationdiagnostics
- diagnostics message for the Flink application ornull
-
getNumberOfRegisteredTaskManagers
public CompletableFuture<Integer> getNumberOfRegisteredTaskManagers()
Description copied from interface:ResourceManagerGateway
Gets the currently registered number of TaskManagers.- Specified by:
getNumberOfRegisteredTaskManagers
in interfaceResourceManagerGateway
- Returns:
- The future to the number of registered TaskManagers.
-
requestTaskManagerInfo
public CompletableFuture<Collection<TaskManagerInfo>> requestTaskManagerInfo(Duration timeout)
Description copied from interface:ResourceManagerGateway
Requests information about the registeredTaskExecutor
.- Specified by:
requestTaskManagerInfo
in interfaceResourceManagerGateway
- Parameters:
timeout
- of the request- Returns:
- Future collection of TaskManager information
-
requestTaskManagerDetailsInfo
public CompletableFuture<TaskManagerInfoWithSlots> requestTaskManagerDetailsInfo(ResourceID resourceId, Duration timeout)
Description copied from interface:ResourceManagerGateway
Requests detail information about the givenTaskExecutor
.- Specified by:
requestTaskManagerDetailsInfo
in interfaceResourceManagerGateway
- Parameters:
resourceId
- identifying the TaskExecutor for which to return informationtimeout
- of the request- Returns:
- Future TaskManager information and its allocated slots
-
requestResourceOverview
public CompletableFuture<ResourceOverview> requestResourceOverview(Duration timeout)
Description copied from interface:ResourceManagerGateway
Requests the resource overview. The resource overview provides information about the connected TaskManagers, the total number of slots and the number of available slots.- Specified by:
requestResourceOverview
in interfaceResourceManagerGateway
- Parameters:
timeout
- of the request- Returns:
- Future containing the resource overview
-
requestTaskManagerMetricQueryServiceAddresses
public CompletableFuture<Collection<Tuple2<ResourceID,String>>> requestTaskManagerMetricQueryServiceAddresses(Duration timeout)
Description copied from interface:ResourceManagerGateway
Requests the paths for the TaskManager'sMetricQueryService
to query.- Specified by:
requestTaskManagerMetricQueryServiceAddresses
in interfaceResourceManagerGateway
- Parameters:
timeout
- for the asynchronous operation- Returns:
- Future containing the collection of resource ids and the corresponding metric query service path
-
requestTaskManagerFileUploadByType
public CompletableFuture<TransientBlobKey> requestTaskManagerFileUploadByType(ResourceID taskManagerId, FileType fileType, Duration timeout)
Description copied from interface:ResourceManagerGateway
Request the file upload from the givenTaskExecutor
to the cluster'sBlobServer
. The correspondingTransientBlobKey
is returned.- Specified by:
requestTaskManagerFileUploadByType
in interfaceResourceManagerGateway
- Parameters:
taskManagerId
- identifying theTaskExecutor
to upload the specified filefileType
- type of the file to uploadtimeout
- for the asynchronous operation- Returns:
- Future which is completed with the
TransientBlobKey
after uploading the file to theBlobServer
.
-
requestTaskManagerFileUploadByNameAndType
public CompletableFuture<TransientBlobKey> requestTaskManagerFileUploadByNameAndType(ResourceID taskManagerId, String fileName, FileType fileType, Duration timeout)
Description copied from interface:ResourceManagerGateway
Request the file upload from the givenTaskExecutor
to the cluster'sBlobServer
. The correspondingTransientBlobKey
is returned.- Specified by:
requestTaskManagerFileUploadByNameAndType
in interfaceResourceManagerGateway
- Parameters:
taskManagerId
- identifying theTaskExecutor
to upload the specified filefileName
- name of the file to uploadfileType
- type of the file to uploadtimeout
- for the asynchronous operation- Returns:
- Future which is completed with the
TransientBlobKey
after uploading the file to theBlobServer
.
-
requestTaskManagerLogList
public CompletableFuture<Collection<LogInfo>> requestTaskManagerLogList(ResourceID taskManagerId, Duration timeout)
Description copied from interface:ResourceManagerGateway
Request log list from the givenTaskExecutor
.- Specified by:
requestTaskManagerLogList
in interfaceResourceManagerGateway
- Parameters:
taskManagerId
- identifying theTaskExecutor
to get log list fromtimeout
- for the asynchronous operation- Returns:
- Future which is completed with the historical log list
-
releaseClusterPartitions
public CompletableFuture<Void> releaseClusterPartitions(IntermediateDataSetID dataSetId)
Description copied from interface:ClusterPartitionManager
Releases all partitions associated with the given dataset.- Specified by:
releaseClusterPartitions
in interfaceClusterPartitionManager
- Parameters:
dataSetId
- dataset for which all associated partitions should be released- Returns:
- future that is completed once all partitions have been released
-
reportClusterPartitions
public CompletableFuture<Void> reportClusterPartitions(ResourceID taskExecutorId, ClusterPartitionReport clusterPartitionReport)
Description copied from interface:ClusterPartitionManager
Report the cluster partitions status in the task executor.- Specified by:
reportClusterPartitions
in interfaceClusterPartitionManager
- Parameters:
taskExecutorId
- The id of the task executor.clusterPartitionReport
- The status of the cluster partitions.- Returns:
- future that is completed once the report have been processed.
-
getClusterPartitionsShuffleDescriptors
public CompletableFuture<List<ShuffleDescriptor>> getClusterPartitionsShuffleDescriptors(IntermediateDataSetID intermediateDataSetID)
Description copied from interface:ClusterPartitionManager
Get the shuffle descriptors of the cluster partitions ordered by partition number.- Specified by:
getClusterPartitionsShuffleDescriptors
in interfaceClusterPartitionManager
- Parameters:
intermediateDataSetID
- The id of the dataset.- Returns:
- shuffle descriptors of the cluster partitions.
-
listDataSets
public CompletableFuture<Map<IntermediateDataSetID,DataSetMetaInfo>> listDataSets()
Description copied from interface:ClusterPartitionManager
Returns all datasets for which partitions are being tracked.- Specified by:
listDataSets
in interfaceClusterPartitionManager
- Returns:
- tracked datasets
-
requestThreadDump
public CompletableFuture<ThreadDumpInfo> requestThreadDump(ResourceID taskManagerId, Duration timeout)
Description copied from interface:ResourceManagerGateway
Requests the thread dump from the givenTaskExecutor
.- Specified by:
requestThreadDump
in interfaceResourceManagerGateway
- Parameters:
taskManagerId
- taskManagerId identifying theTaskExecutor
to get the thread dump fromtimeout
- timeout of the asynchronous operation- Returns:
- Future containing the thread dump information
-
requestTaskManagerProfilingList
public CompletableFuture<Collection<ProfilingInfo>> requestTaskManagerProfilingList(ResourceID taskManagerId, Duration timeout)
Description copied from interface:ResourceManagerGateway
Request profiling list from the givenTaskExecutor
.- Specified by:
requestTaskManagerProfilingList
in interfaceResourceManagerGateway
- Parameters:
taskManagerId
- identifying theTaskExecutor
to get profiling list fromtimeout
- for the asynchronous operation- Returns:
- Future which is completed with the historical profiling list
-
requestProfiling
public CompletableFuture<ProfilingInfo> requestProfiling(ResourceID taskManagerId, int duration, ProfilingInfo.ProfilingMode mode, Duration timeout)
Description copied from interface:ResourceManagerGateway
Requests the profiling instance from the givenTaskExecutor
.- Specified by:
requestProfiling
in interfaceResourceManagerGateway
- Parameters:
taskManagerId
- taskManagerId identifying theTaskExecutor
to get the profiling fromduration
- profiling durationmode
- profiling modeProfilingInfo.ProfilingMode
timeout
- timeout of the asynchronous operation- Returns:
- Future containing the created profiling information
-
requestTaskExecutorThreadInfoGateway
public CompletableFuture<TaskExecutorThreadInfoGateway> requestTaskExecutorThreadInfoGateway(ResourceID taskManagerId, Duration timeout)
Description copied from interface:ResourceManagerGateway
Requests theTaskExecutorGateway
.- Specified by:
requestTaskExecutorThreadInfoGateway
in interfaceResourceManagerGateway
- Parameters:
taskManagerId
- identifying theTaskExecutor
.- Returns:
- Future containing the task executor gateway.
-
notifyNewBlockedNodes
public CompletableFuture<Acknowledge> notifyNewBlockedNodes(Collection<BlockedNode> newNodes)
Description copied from interface:BlocklistListener
Notify new blocked node records.- Specified by:
notifyNewBlockedNodes
in interfaceBlocklistListener
- Parameters:
newNodes
- the new blocked node records- Returns:
- Future acknowledge once the new nodes have successfully notified.
-
registerMetrics
protected void registerMetrics()
-
closeJobManagerConnection
protected void closeJobManagerConnection(JobID jobId, org.apache.flink.runtime.resourcemanager.ResourceManager.ResourceRequirementHandling resourceRequirementHandling, Exception cause)
This method should be called by the framework once it detects that a currently registered job manager has failed.- Parameters:
jobId
- identifying the job whose leader shall be disconnected.resourceRequirementHandling
- indicating how existing resource requirements for the corresponding job should be handledcause
- The exception which cause the JobManager failed.
-
closeTaskManagerConnection
protected Optional<WorkerType> closeTaskManagerConnection(ResourceID resourceID, Exception cause)
This method should be called by the framework once it detects that a currently registered task executor has failed.- Parameters:
resourceID
- Id of the TaskManager that has failed.cause
- The exception which cause the TaskManager failed.- Returns:
- The
ResourceManager
of the closed connection, or empty if already removed.
-
jobLeaderLostLeadership
protected void jobLeaderLostLeadership(JobID jobId, JobMasterId oldJobMasterId)
-
getInstanceIdByResourceId
@VisibleForTesting public Optional<InstanceID> getInstanceIdByResourceId(ResourceID resourceID)
-
getWorkerByInstanceId
protected WorkerType getWorkerByInstanceId(InstanceID instanceId)
-
onFatalError
protected void onFatalError(Throwable t)
Notifies the ResourceManager that a fatal error has occurred and it cannot proceed.- Parameters:
t
- The exception describing the fatal error
-
initialize
protected abstract void initialize() throws ResourceManagerException
Initializes the framework specific components.- Throws:
ResourceManagerException
- which occurs during initialization and causes the resource manager to fail.
-
terminate
protected abstract void terminate() throws Exception
Terminates the framework specific components.- Throws:
Exception
- which occurs during termination.
-
internalDeregisterApplication
protected abstract void internalDeregisterApplication(ApplicationStatus finalStatus, @Nullable String optionalDiagnostics) throws ResourceManagerException
The framework specific code to deregister the application. This should report the application's final status and shut down the resource manager cleanly.This method also needs to make sure all pending containers that are not registered yet are returned.
- Parameters:
finalStatus
- The application status to report.optionalDiagnostics
- A diagnostics message ornull
.- Throws:
ResourceManagerException
- if the application could not be shut down.
-
getWorkerNodeIfAcceptRegistration
protected abstract Optional<WorkerType> getWorkerNodeIfAcceptRegistration(ResourceID resourceID)
Get worker node if the worker resource is accepted.- Parameters:
resourceID
- The worker resource id
-
stopWorkerIfSupported
public void stopWorkerIfSupported(WorkerType worker)
Stops the given worker if supported.- Parameters:
worker
- The worker.
-
getReadyToServeFuture
protected abstract CompletableFuture<Void> getReadyToServeFuture()
Get the ready to serve future of the resource manager.- Returns:
- The ready to serve future of the resource manager, which indicated whether it is ready to serve.
-
getResourceAllocator
protected abstract ResourceAllocator getResourceAllocator()
-
setFailUnfulfillableRequest
protected void setFailUnfulfillableRequest(boolean failUnfulfillableRequest)
SetSlotManager
whether to fail unfulfillable slot requests.- Parameters:
failUnfulfillableRequest
- whether to fail unfulfillable requests
-
onNewTokensObtained
public void onNewTokensObtained(byte[] tokens) throws Exception
Description copied from interface:DelegationTokenManager.Listener
Callback function when new delegation tokens obtained.- Specified by:
onNewTokensObtained
in interfaceDelegationTokenManager.Listener
- Throws:
Exception
-
-