Class ActiveResourceManager<WorkerType extends ResourceIDRetrievable>
- java.lang.Object
-
- org.apache.flink.runtime.rpc.RpcEndpoint
-
- org.apache.flink.runtime.rpc.FencedRpcEndpoint<ResourceManagerId>
-
- org.apache.flink.runtime.resourcemanager.ResourceManager<WorkerType>
-
- org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager<WorkerType>
-
- All Implemented Interfaces:
AutoCloseable
,BlocklistListener
,ClusterPartitionManager
,ResourceEventHandler<WorkerType>
,ResourceManagerGateway
,FencedRpcGateway<ResourceManagerId>
,RpcGateway
,DelegationTokenManager.Listener
,AutoCloseableAsync
public class ActiveResourceManager<WorkerType extends ResourceIDRetrievable> extends ResourceManager<WorkerType> implements ResourceEventHandler<WorkerType>
An active implementation ofResourceManager
.This resource manager actively requests and releases resources from/to the external resource management frameworks. With different
ResourceManagerDriver
provided, this resource manager can work with various frameworks.
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class org.apache.flink.runtime.rpc.RpcEndpoint
RpcEndpoint.MainThreadExecutor
-
-
Field Summary
Fields Modifier and Type Field Description protected Configuration
flinkConfig
-
Fields inherited from class org.apache.flink.runtime.resourcemanager.ResourceManager
blocklistHandler, ioExecutor, RESOURCE_MANAGER_NAME, resourceManagerMetricGroup
-
Fields inherited from class org.apache.flink.runtime.rpc.RpcEndpoint
log, rpcServer
-
-
Constructor Summary
Constructors Constructor Description ActiveResourceManager(ResourceManagerDriver<WorkerType> resourceManagerDriver, Configuration flinkConfig, RpcService rpcService, UUID leaderSessionId, ResourceID resourceId, HeartbeatServices heartbeatServices, DelegationTokenManager delegationTokenManager, SlotManager slotManager, ResourceManagerPartitionTrackerFactory clusterPartitionTrackerFactory, BlocklistHandler.Factory blocklistHandlerFactory, JobLeaderIdService jobLeaderIdService, ClusterInformation clusterInformation, FatalErrorHandler fatalErrorHandler, ResourceManagerMetricGroup resourceManagerMetricGroup, ThresholdMeter startWorkerFailureRater, Duration retryInterval, Duration workerRegistrationTimeout, Duration previousWorkerRecoverTimeout, Executor ioExecutor)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
declareResourceNeeded(Collection<ResourceDeclaration> resourceDeclarations)
CompletableFuture<Void>
getReadyToServeFuture()
Get the ready to serve future of the resource manager.protected ResourceAllocator
getResourceAllocator()
protected Optional<WorkerType>
getWorkerNodeIfAcceptRegistration(ResourceID resourceID)
Get worker node if the worker resource is accepted.protected void
initialize()
Initializes the framework specific components.protected void
internalDeregisterApplication(ApplicationStatus finalStatus, String optionalDiagnostics)
The framework specific code to deregister the application.void
onError(Throwable exception)
Notifies that an error has occurred that the process cannot proceed.void
onPreviousAttemptWorkersRecovered(Collection<WorkerType> recoveredWorkers)
Notifies that workers of previous attempt have been recovered from the external resource manager.protected void
onWorkerRegistered(WorkerType worker, WorkerResourceSpec workerResourceSpec)
void
onWorkerTerminated(ResourceID resourceId, String diagnostics)
Notifies that the worker has been terminated.protected void
registerMetrics()
void
requestNewWorker(WorkerResourceSpec workerResourceSpec)
Allocates a resource using the worker resource specification.protected void
terminate()
Terminates the framework specific components.-
Methods inherited from class org.apache.flink.runtime.resourcemanager.ResourceManager
closeJobManagerConnection, closeTaskManagerConnection, declareRequiredResources, deregisterApplication, disconnectJobManager, disconnectTaskManager, getClusterPartitionsShuffleDescriptors, getInstanceIdByResourceId, getNumberOfRegisteredTaskManagers, getStartedFuture, getWorkerByInstanceId, heartbeatFromJobManager, heartbeatFromTaskManager, jobLeaderLostLeadership, listDataSets, notifyNewBlockedNodes, notifySlotAvailable, onFatalError, onNewTokensObtained, onStart, onStop, registerJobMaster, registerTaskExecutor, releaseClusterPartitions, removeJob, reportClusterPartitions, requestProfiling, requestResourceOverview, requestTaskExecutorThreadInfoGateway, requestTaskManagerDetailsInfo, requestTaskManagerFileUploadByNameAndType, requestTaskManagerFileUploadByType, requestTaskManagerInfo, requestTaskManagerLogList, requestTaskManagerMetricQueryServiceAddresses, requestTaskManagerProfilingList, requestThreadDump, sendSlotReport, setFailUnfulfillableRequest, stopWorkerIfSupported
-
Methods inherited from class org.apache.flink.runtime.rpc.FencedRpcEndpoint
getFencingToken
-
Methods inherited from class org.apache.flink.runtime.rpc.RpcEndpoint
callAsync, closeAsync, getAddress, getEndpointId, getHostname, getMainThreadExecutor, getMainThreadExecutor, getRpcService, getSelfGateway, getTerminationFuture, internalCallOnStart, internalCallOnStop, isRunning, registerResource, runAsync, scheduleRunAsync, scheduleRunAsync, start, stop, unregisterResource, validateRunsInMainThread
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface org.apache.flink.util.AutoCloseableAsync
close
-
Methods inherited from interface org.apache.flink.runtime.rpc.FencedRpcGateway
getFencingToken
-
Methods inherited from interface org.apache.flink.runtime.rpc.RpcGateway
getAddress, getHostname
-
-
-
-
Field Detail
-
flinkConfig
protected final Configuration flinkConfig
-
-
Constructor Detail
-
ActiveResourceManager
public ActiveResourceManager(ResourceManagerDriver<WorkerType> resourceManagerDriver, Configuration flinkConfig, RpcService rpcService, UUID leaderSessionId, ResourceID resourceId, HeartbeatServices heartbeatServices, DelegationTokenManager delegationTokenManager, SlotManager slotManager, ResourceManagerPartitionTrackerFactory clusterPartitionTrackerFactory, BlocklistHandler.Factory blocklistHandlerFactory, JobLeaderIdService jobLeaderIdService, ClusterInformation clusterInformation, FatalErrorHandler fatalErrorHandler, ResourceManagerMetricGroup resourceManagerMetricGroup, ThresholdMeter startWorkerFailureRater, Duration retryInterval, Duration workerRegistrationTimeout, Duration previousWorkerRecoverTimeout, Executor ioExecutor)
-
-
Method Detail
-
initialize
protected void initialize() throws ResourceManagerException
Description copied from class:ResourceManager
Initializes the framework specific components.- Specified by:
initialize
in classResourceManager<WorkerType extends ResourceIDRetrievable>
- Throws:
ResourceManagerException
- which occurs during initialization and causes the resource manager to fail.
-
terminate
protected void terminate() throws ResourceManagerException
Description copied from class:ResourceManager
Terminates the framework specific components.- Specified by:
terminate
in classResourceManager<WorkerType extends ResourceIDRetrievable>
- Throws:
ResourceManagerException
-
internalDeregisterApplication
protected void internalDeregisterApplication(ApplicationStatus finalStatus, @Nullable String optionalDiagnostics) throws ResourceManagerException
Description copied from class:ResourceManager
The framework specific code to deregister the application. This should report the application's final status and shut down the resource manager cleanly.This method also needs to make sure all pending containers that are not registered yet are returned.
- Specified by:
internalDeregisterApplication
in classResourceManager<WorkerType extends ResourceIDRetrievable>
- Parameters:
finalStatus
- The application status to report.optionalDiagnostics
- A diagnostics message ornull
.- Throws:
ResourceManagerException
- if the application could not be shut down.
-
getWorkerNodeIfAcceptRegistration
protected Optional<WorkerType> getWorkerNodeIfAcceptRegistration(ResourceID resourceID)
Description copied from class:ResourceManager
Get worker node if the worker resource is accepted.- Specified by:
getWorkerNodeIfAcceptRegistration
in classResourceManager<WorkerType extends ResourceIDRetrievable>
- Parameters:
resourceID
- The worker resource id
-
declareResourceNeeded
@VisibleForTesting public void declareResourceNeeded(Collection<ResourceDeclaration> resourceDeclarations)
-
onWorkerRegistered
protected void onWorkerRegistered(WorkerType worker, WorkerResourceSpec workerResourceSpec)
- Overrides:
onWorkerRegistered
in classResourceManager<WorkerType extends ResourceIDRetrievable>
-
registerMetrics
protected void registerMetrics()
- Overrides:
registerMetrics
in classResourceManager<WorkerType extends ResourceIDRetrievable>
-
onPreviousAttemptWorkersRecovered
public void onPreviousAttemptWorkersRecovered(Collection<WorkerType> recoveredWorkers)
Description copied from interface:ResourceEventHandler
Notifies that workers of previous attempt have been recovered from the external resource manager.- Specified by:
onPreviousAttemptWorkersRecovered
in interfaceResourceEventHandler<WorkerType extends ResourceIDRetrievable>
- Parameters:
recoveredWorkers
- Collection of worker nodes, in the deployment specific type.
-
onWorkerTerminated
public void onWorkerTerminated(ResourceID resourceId, String diagnostics)
Description copied from interface:ResourceEventHandler
Notifies that the worker has been terminated.- Specified by:
onWorkerTerminated
in interfaceResourceEventHandler<WorkerType extends ResourceIDRetrievable>
- Parameters:
resourceId
- Identifier of the terminated worker.diagnostics
- Diagnostic message about the worker termination.
-
onError
public void onError(Throwable exception)
Description copied from interface:ResourceEventHandler
Notifies that an error has occurred that the process cannot proceed.- Specified by:
onError
in interfaceResourceEventHandler<WorkerType extends ResourceIDRetrievable>
- Parameters:
exception
- Exception that describes the error.
-
requestNewWorker
@VisibleForTesting public void requestNewWorker(WorkerResourceSpec workerResourceSpec)
Allocates a resource using the worker resource specification.- Parameters:
workerResourceSpec
- workerResourceSpec specifies the size of the to be allocated resource
-
getReadyToServeFuture
public CompletableFuture<Void> getReadyToServeFuture()
Description copied from class:ResourceManager
Get the ready to serve future of the resource manager.- Specified by:
getReadyToServeFuture
in classResourceManager<WorkerType extends ResourceIDRetrievable>
- Returns:
- The ready to serve future of the resource manager, which indicated whether it is ready to serve.
-
getResourceAllocator
protected ResourceAllocator getResourceAllocator()
- Specified by:
getResourceAllocator
in classResourceManager<WorkerType extends ResourceIDRetrievable>
-
-