Interface HighAvailabilityServices
-
- All Superinterfaces:
AutoCloseable
,ClientHighAvailabilityServices
,GloballyCleanableResource
- All Known Implementing Classes:
AbstractHaServices
,AbstractNonHaServices
,EmbeddedHaServices
,EmbeddedHaServicesWithLeadershipControl
,KubernetesLeaderElectionHaServices
,StandaloneHaServices
,ZooKeeperLeaderElectionHaServices
public interface HighAvailabilityServices extends ClientHighAvailabilityServices, GloballyCleanableResource
The HighAvailabilityServices give access to all services needed for a highly-available setup. In particular, the services provide access to highly available storage and registries, as well as distributed counters and leader election.- ResourceManager leader election and leader retrieval
- JobManager leader election and leader retrieval
- Persistence for checkpoint metadata
- Registering the latest completed checkpoint(s)
- Persistence for the BLOB store
- Registry that marks a job's status
- Naming of RPC endpoints
-
-
Field Summary
Fields Modifier and Type Field Description static JobID
DEFAULT_JOB_ID
This JobID should be used to identify the old JobManager when using theHighAvailabilityServices
.static UUID
DEFAULT_LEADER_ID
This UUID should be used when no proper leader election happens, but a simple pre-configured leader is used.
-
Method Summary
All Methods Instance Methods Abstract Methods Default Methods Deprecated Methods Modifier and Type Method Description void
cleanupAllData()
Deletes all data stored by high availability services in external stores.void
close()
Closes the high availability services, releasing all resources.default void
closeWithOptionalClean(boolean cleanupData)
BlobStore
createBlobStore()
Creates the BLOB store in which BLOBs are stored in a highly-available fashion.CheckpointRecoveryFactory
getCheckpointRecoveryFactory()
Gets the checkpoint recovery factory for the job manager.default LeaderElection
getClusterRestEndpointLeaderElection()
Gets theLeaderElection
for the cluster's rest endpoint.default LeaderRetrievalService
getClusterRestEndpointLeaderRetriever()
Get the leader retriever for the cluster's rest endpoint.LeaderElection
getDispatcherLeaderElection()
Gets theLeaderElection
for the cluster's dispatcher.LeaderRetrievalService
getDispatcherLeaderRetriever()
Gets the leader retriever for the dispatcher.ExecutionPlanStore
getExecutionPlanStore()
Gets the submitted execution plan store for the job manager.LeaderElection
getJobManagerLeaderElection(JobID jobID)
Gets theLeaderElection
for the job with the givenJobID
.LeaderRetrievalService
getJobManagerLeaderRetriever(JobID jobID)
Deprecated.This method should only be used by the legacy code where the JobManager acts as the master.LeaderRetrievalService
getJobManagerLeaderRetriever(JobID jobID, String defaultJobManagerAddress)
Gets the leader retriever for the job JobMaster which is responsible for the given job.JobResultStore
getJobResultStore()
Gets the store that holds information about the state of finished jobs.LeaderElection
getResourceManagerLeaderElection()
Gets theLeaderElection
for the cluster's resource manager.LeaderRetrievalService
getResourceManagerLeaderRetriever()
Gets the leader retriever for the cluster's resource manager.default LeaderElection
getWebMonitorLeaderElection()
Deprecated.UsegetClusterRestEndpointLeaderElection()
instead.default LeaderRetrievalService
getWebMonitorLeaderRetriever()
Deprecated.just usegetClusterRestEndpointLeaderRetriever()
instead of this method.default CompletableFuture<Void>
globalCleanupAsync(JobID jobId, Executor executor)
globalCleanupAsync
is expected to be called from the main thread.
-
-
-
Field Detail
-
DEFAULT_LEADER_ID
static final UUID DEFAULT_LEADER_ID
This UUID should be used when no proper leader election happens, but a simple pre-configured leader is used. That is for example the case in non-highly-available standalone setups.
-
DEFAULT_JOB_ID
static final JobID DEFAULT_JOB_ID
This JobID should be used to identify the old JobManager when using theHighAvailabilityServices
. With the new mode every JobMaster will have a distinct JobID assigned.
-
-
Method Detail
-
getResourceManagerLeaderRetriever
LeaderRetrievalService getResourceManagerLeaderRetriever()
Gets the leader retriever for the cluster's resource manager.
-
getDispatcherLeaderRetriever
LeaderRetrievalService getDispatcherLeaderRetriever()
Gets the leader retriever for the dispatcher. This leader retrieval service is not always accessible.
-
getJobManagerLeaderRetriever
@Deprecated LeaderRetrievalService getJobManagerLeaderRetriever(JobID jobID)
Deprecated.This method should only be used by the legacy code where the JobManager acts as the master.Gets the leader retriever for the job JobMaster which is responsible for the given job.- Parameters:
jobID
- The identifier of the job.- Returns:
- Leader retrieval service to retrieve the job manager for the given job
-
getJobManagerLeaderRetriever
LeaderRetrievalService getJobManagerLeaderRetriever(JobID jobID, String defaultJobManagerAddress)
Gets the leader retriever for the job JobMaster which is responsible for the given job.- Parameters:
jobID
- The identifier of the job.defaultJobManagerAddress
- JobManager address which will be returned by a static leader retrieval service.- Returns:
- Leader retrieval service to retrieve the job manager for the given job
-
getWebMonitorLeaderRetriever
@Deprecated default LeaderRetrievalService getWebMonitorLeaderRetriever()
Deprecated.just usegetClusterRestEndpointLeaderRetriever()
instead of this method.This retriever should no longer be used on the cluster side. The web monitor retriever is only required on the client-side and we have a dedicated high-availability services for the client, namedClientHighAvailabilityServices
. See also FLINK-13750.- Returns:
- the leader retriever for web monitor
-
getResourceManagerLeaderElection
LeaderElection getResourceManagerLeaderElection()
Gets theLeaderElection
for the cluster's resource manager.
-
getDispatcherLeaderElection
LeaderElection getDispatcherLeaderElection()
Gets theLeaderElection
for the cluster's dispatcher.
-
getJobManagerLeaderElection
LeaderElection getJobManagerLeaderElection(JobID jobID)
Gets theLeaderElection
for the job with the givenJobID
.
-
getWebMonitorLeaderElection
@Deprecated default LeaderElection getWebMonitorLeaderElection()
Deprecated.UsegetClusterRestEndpointLeaderElection()
instead.Gets theLeaderElection
for the cluster's rest endpoint.
-
getCheckpointRecoveryFactory
CheckpointRecoveryFactory getCheckpointRecoveryFactory() throws Exception
Gets the checkpoint recovery factory for the job manager.- Returns:
- Checkpoint recovery factory
- Throws:
Exception
-
getExecutionPlanStore
ExecutionPlanStore getExecutionPlanStore() throws Exception
Gets the submitted execution plan store for the job manager.- Returns:
- Submitted execution plan store
- Throws:
Exception
- if the submitted execution plan store could not be created
-
getJobResultStore
JobResultStore getJobResultStore() throws Exception
Gets the store that holds information about the state of finished jobs.- Returns:
- Store of finished job results
- Throws:
Exception
- if job result store could not be created
-
createBlobStore
BlobStore createBlobStore() throws IOException
Creates the BLOB store in which BLOBs are stored in a highly-available fashion.- Returns:
- Blob store
- Throws:
IOException
- if the blob store could not be created
-
getClusterRestEndpointLeaderElection
default LeaderElection getClusterRestEndpointLeaderElection()
Gets theLeaderElection
for the cluster's rest endpoint.
-
getClusterRestEndpointLeaderRetriever
default LeaderRetrievalService getClusterRestEndpointLeaderRetriever()
Description copied from interface:ClientHighAvailabilityServices
Get the leader retriever for the cluster's rest endpoint.- Specified by:
getClusterRestEndpointLeaderRetriever
in interfaceClientHighAvailabilityServices
- Returns:
- the leader retriever for cluster's rest endpoint.
-
close
void close() throws Exception
Closes the high availability services, releasing all resources.This method does not delete or clean up any data stored in external stores (file systems, ZooKeeper, etc). Another instance of the high availability services will be able to recover the job.
If an exception occurs during closing services, this method will attempt to continue closing other services and report exceptions only after all services have been attempted to be closed.
- Specified by:
close
in interfaceAutoCloseable
- Throws:
Exception
- Thrown, if an exception occurred while closing these services.
-
cleanupAllData
void cleanupAllData() throws Exception
Deletes all data stored by high availability services in external stores.After this method was called, any job or session that was managed by these high availability services will be unrecoverable.
If an exception occurs during cleanup, this method will attempt to continue the cleanup and report exceptions only after all cleanup steps have been attempted.
- Throws:
Exception
- if an error occurred while cleaning up data stored by them.
-
closeWithOptionalClean
default void closeWithOptionalClean(boolean cleanupData) throws Exception
CallscleanupAllData()
(iftrue
is passed as a parameter) before callingclose()
on this instance. Any error that appeared during the cleanup will be propagated after callingclose()
.- Throws:
Exception
-
globalCleanupAsync
default CompletableFuture<Void> globalCleanupAsync(JobID jobId, Executor executor)
Description copied from interface:GloballyCleanableResource
globalCleanupAsync
is expected to be called from the main thread. Heavy IO tasks should be outsourced into the passedcleanupExecutor
. Thread-safety must be ensured.- Specified by:
globalCleanupAsync
in interfaceGloballyCleanableResource
- Parameters:
jobId
- TheJobID
of the job for which the local data should be cleaned up.executor
- The fallback executor for IO-heavy operations.- Returns:
- The cleanup result future.
-
-