public class YarnIntraNonHaMasterServices extends AbstractYarnNonHaServices
Internally, these services put their recovery data into YARN's working directory, except for checkpoints, which are in the configured checkpoint directory. That way, checkpoints can be resumed with a new job/application, even if the complete YARN application is killed and cleaned up.
Because ResourceManager and JobManager run both in the same process (Application Master), they use an embedded leader election service to find each other.
A typical YARN setup that uses these HA services first starts the ResourceManager inside the ApplicationMaster and puts its RPC endpoint address into the configuration with which the TaskManagers are started. Because of this static addressing scheme, the setup cannot handle failures of the JobManager and ResourceManager, which are running as part of the Application Master.
HighAvailabilityServices
blobStoreService, FLINK_RECOVERY_DATA_DIR, flinkFileSystem, haDataDirectory, hadoopFileSystem, LOG, workingDirectory
DEFAULT_JOB_ID, DEFAULT_LEADER_ID
Constructor and Description |
---|
YarnIntraNonHaMasterServices(Configuration config,
Configuration hadoopConf)
Creates new YarnIntraNonHaMasterServices for the given Flink and YARN configuration.
|
Modifier and Type | Method and Description |
---|---|
void |
close()
Closes the high availability services, releasing all resources.
|
LeaderElectionService |
getDispatcherLeaderElectionService()
Gets the leader election service for the cluster's dispatcher.
|
LeaderRetrievalService |
getDispatcherLeaderRetriever()
Gets the leader retriever for the dispatcher.
|
LeaderElectionService |
getJobManagerLeaderElectionService(JobID jobID)
Gets the leader election service for the given job.
|
LeaderRetrievalService |
getJobManagerLeaderRetriever(JobID jobID)
Gets the leader retriever for the job JobMaster which is responsible for the given job
|
LeaderRetrievalService |
getJobManagerLeaderRetriever(JobID jobID,
String defaultJobManagerAddress)
Gets the leader retriever for the job JobMaster which is responsible for the given job
|
LeaderElectionService |
getResourceManagerLeaderElectionService()
Gets the leader election service for the cluster's resource manager.
|
LeaderRetrievalService |
getResourceManagerLeaderRetriever()
Gets the leader retriever for the cluster's resource manager.
|
LeaderElectionService |
getWebMonitorLeaderElectionService() |
LeaderRetrievalService |
getWebMonitorLeaderRetriever() |
getCheckpointRecoveryFactory, getRunningJobsRegistry, getSubmittedJobGraphStore
closeAndCleanupAllData, createBlobStore, forSingleJobAppMaster, forYarnTaskManager, isClosed
public YarnIntraNonHaMasterServices(Configuration config, Configuration hadoopConf) throws IOException
This constructor initializes access to the HDFS to store recovery data, and creates the embedded leader election services through which ResourceManager and JobManager find and confirm each other.
config
- The Flink configuration of this component / process.hadoopConf
- The Hadoop configuration for the YARN cluster.IOException
- Thrown, if the initialization of the Hadoop file system used by YARN fails.IllegalConfigurationException
- Thrown, if the Flink configuration does not properly describe the ResourceManager address and port.public LeaderRetrievalService getResourceManagerLeaderRetriever()
HighAvailabilityServices
public LeaderRetrievalService getDispatcherLeaderRetriever()
HighAvailabilityServices
public LeaderElectionService getResourceManagerLeaderElectionService()
HighAvailabilityServices
public LeaderElectionService getDispatcherLeaderElectionService()
HighAvailabilityServices
public LeaderElectionService getJobManagerLeaderElectionService(JobID jobID)
HighAvailabilityServices
jobID
- The identifier of the job running the election.public LeaderElectionService getWebMonitorLeaderElectionService()
public LeaderRetrievalService getJobManagerLeaderRetriever(JobID jobID)
HighAvailabilityServices
jobID
- The identifier of the job.public LeaderRetrievalService getJobManagerLeaderRetriever(JobID jobID, String defaultJobManagerAddress)
HighAvailabilityServices
jobID
- The identifier of the job.defaultJobManagerAddress
- JobManager address which will be returned by
a static leader retrieval service.public LeaderRetrievalService getWebMonitorLeaderRetriever()
public void close() throws Exception
HighAvailabilityServices
This method does not delete or clean up any data stored in external stores (file systems, ZooKeeper, etc). Another instance of the high availability services will be able to recover the job.
If an exception occurs during closing services, this method will attempt to continue closing other services and report exceptions only after all services have been attempted to be closed.
close
in interface AutoCloseable
close
in interface HighAvailabilityServices
close
in class YarnHighAvailabilityServices
Exception
- Thrown, if an exception occurred while closing these services.Copyright © 2014–2020 The Apache Software Foundation. All rights reserved.