Start a YARN session with 4 Task Managers (each with 4 GB of Heapspace):
# get the hadoop2 package from the Flink download page at # http://flink.apache.org/downloads.html curl -O <flink_hadoop2_download_url> tar xvzf flink-1.0.3-bin-hadoop2.tgz cd flink-1.0.3/ ./bin/yarn-session.sh -n 4 -jm 1024 -tm 4096
-s flag for the number of processing slots per Task Manager. We recommend to set the number of slots to the number of processors per machine.
Once the session has been started, you can submit jobs to the cluster using the
# get the hadoop2 package from the Flink download page at # http://flink.apache.org/downloads.html curl -O <flink_hadoop2_download_url> tar xvzf flink-1.0.3-bin-hadoop2.tgz cd flink-1.0.3/ ./bin/flink run -m yarn-cluster -yn 4 -yjm 1024 -ytm 4096 ./examples/batch/WordCount.jar
Apache Hadoop YARN is a cluster resource management framework. It allows to run various distributed applications on top of a cluster. Flink runs on YARN next to other applications. Users do not have to setup or install anything if there is already a YARN setup.
If you have troubles using the Flink YARN client, have a look in the FAQ section.
Follow these instructions to learn how to launch a Flink Session within your YARN cluster.
A session will start all required Flink services (JobManager and TaskManagers) so that you can submit programs to the cluster. Note that you can run multiple programs per session.
Download a Flink package for Hadoop >= 2 from the download page. It contains the required files.
Extract the package using:
tar xvzf flink-1.0.3-bin-hadoop2.tgz cd flink-1.0.3/
Use the following command to start a session
This command will show you the following overview:
Usage: Required -n,--container <arg> Number of YARN container to allocate (=Number of Task Managers) Optional -D <arg> Dynamic properties -d,--detached Start detached -jm,--jobManagerMemory <arg> Memory for JobManager Container [in MB] -nm,--name Set a custom name for the application on YARN -q,--query Display available YARN resources (memory, cores) -qu,--queue <arg> Specify YARN queue. -s,--slots <arg> Number of slots per TaskManager -tm,--taskManagerMemory <arg> Memory per TaskManager Container [in MB]
Please note that the Client requires the
HADOOP_CONF_DIR environment variable to be set to read the YARN and HDFS configuration.
Example: Issue the following command to allocate 10 Task Managers, with 8 GB of memory and 32 processing slots each:
./bin/yarn-session.sh -n 10 -tm 8192 -s 32
The system will use the configuration in
conf/flink-config.yaml. Please follow our configuration guide if you want to change something.
Flink on YARN will overwrite the following configuration parameters
jobmanager.rpc.address (because the JobManager is always allocated at different machines),
taskmanager.tmp.dirs (we are using the tmp directories given by YARN) and
parallelism.default if the number of slots has been specified.
If you don’t want to change the configuration file to set configuration parameters, there is the option to pass dynamic properties via the
-D flag. So you can pass parameters this way:
The example invocation starts 11 containers, since there is one additional container for the ApplicationMaster and Job Manager.
Once Flink is deployed in your YARN cluster, it will show you the connection details of the Job Manager.
Stop the YARN session by stopping the unix process (using CTRL+C) or by entering ‘stop’ into the client.
If you do not want to keep the Flink YARN client running all the time, its also possible to start a detached YARN session.
The parameter for that is called
In that case, the Flink YARN client will only submit Flink to the cluster and then close itself. Note that in this case its not possible to stop the YARN session using Flink.
Use the YARN utilities (
yarn application -kill <appId) to stop the YARN session.
Use the following command to submit a Flink program to the YARN cluster:
Please refer to the documentation of the commandline client.
The command will show you a help menu like this:
[...] Action "run" compiles and runs a program. Syntax: run [OPTIONS] <jar-file> <arguments> "run" action arguments: -c,--class <classname> Class with the program entry point ("main" method or "getPlan()" method. Only needed if the JAR file does not specify the class in its manifest. -m,--jobmanager <host:port> Address of the JobManager (master) to which to connect. Use this flag to connect to a different JobManager than the one specified in the configuration. -p,--parallelism <parallelism> The parallelism with which to run the program. Optional flag to override the default value specified in the configuration
Use the run action to submit a job to YARN. The client is able to determine the address of the JobManager. In the rare event of a problem, you can also pass the JobManager address using the
-m argument. The JobManager address is visible in the YARN console.
wget -O LICENSE-2.0.txt http://www.apache.org/licenses/LICENSE-2.0.txt hadoop fs -copyFromLocal LICENSE-2.0.txt hdfs:/// ... ./bin/flink run ./examples/batch/WordCount.jar \ hdfs:///..../LICENSE-2.0.txt hdfs:///.../wordcount-result.txt
If there is the following error, make sure that all TaskManagers started:
Exception in thread "main" org.apache.flink.compiler.CompilerException: Available instances could not be determined from job manager: Connection timed out.
You can check the number of TaskManagers in the JobManager web interface. The address of this interface is printed in the YARN session console.
If the TaskManagers do not show up after a minute, you should investigate the issue using the log files.
The documentation above describes how to start a Flink cluster within a Hadoop YARN environment. It is also possible to launch Flink within YARN only for executing a single job.
Please note that the client then expects the
-yn value to be set (number of TaskManagers).
./bin/flink run -m yarn-cluster -yn 2 ./examples/batch/WordCount.jar
The command line options of the YARN session are also available with the
./bin/flink tool. They are prefixed with a
yarn (for the long argument options).
Note: You can use a different configuration directory per job by setting the environment variable
FLINK_CONF_DIR. To use this copy the
conf directory from the Flink distribution and modify, for example, the logging settings on a per-job basis.
Note: It is possible to combine
-m yarn-cluster with a detached YARN submission (
-yd) to “fire and forget” a Flink job to the YARN cluster. In this case, your application will not get any accumulator results or exceptions from the ExecutionEnvironment.execute() call!
Flink’s YARN client has the following configuration parameters to control how to behave in case of container failures. These parameters can be set either from the
conf/flink-conf.yaml or when starting the YARN session, using
yarn.reallocate-failed: This parameter controls whether Flink should reallocate failed TaskManager containers. Default: true
yarn.maximum-failed-containers: The maximum number of failed containers the ApplicationMaster accepts until it fails the YARN session. Default: The number of initally requested TaskManagers (
yarn.application-attempts: The number of ApplicationMaster (+ its TaskManager containers) attempts. If this value is set to 1 (default), the entire YARN session will fail when the Application master fails. Higher values specify the number of restarts of the ApplicationMaster by YARN.
There are many reasons why a Flink YARN session deployment can fail. A misconfigured Hadoop setup (HDFS permissions, YARN configuration), version incompatibilities (running Flink with vanilla Hadoop dependencies on Cloudera Hadoop) or other errors.
In cases where the Flink YARN session fails during the deployment itself, users have to rely on the logging capabilities of Hadoop YARN. The most useful feature for that is the YARN log aggregation.
To enable it, users have to set the
yarn.log-aggregation-enable property to
true in the
Once that is enabled, users can use the following command to retrieve all log files of a (failed) YARN session.
yarn logs -applicationId <application ID>
Note that it takes a few seconds after the session has finished until the logs show up.
The Flink YARN client also prints error messages in the terminal if errors occur during runtime (for example if a TaskManager stops working after some time).
In addition to that, there is the YARN Resource Manager webinterface (by default on port 8088). The port of the Resource Manager web interface is determined by the
yarn.resourcemanager.webapp.address configuration value.
It allows to access log files for running YARN applications and shows diagnostics for failed apps.
Users using Hadoop distributions from companies like Hortonworks, Cloudera or MapR might have to build Flink against their specific versions of Hadoop (HDFS) and YARN. Please read the build instructions for more details.
Some YARN clusters use firewalls for controlling the network traffic between the cluster and the rest of the network. In those setups, Flink jobs can only be submitted to a YARN session from within the cluster’s network (behind the firewall). If this is not feasible for production use, Flink allows to configure a port range for all relevant services. With these ranges configured, users can also submit jobs to Flink crossing the firewall.
Currently, two services are needed to submit a job:
When submitting a job to Flink, the BlobServer will distribute the jars with the user code to all worker nodes (TaskManagers). The JobManager receives the job itself and triggers the execution.
The two configuration parameters for specifying the ports are the following:
These two configuration options accept single ports (for example: “50010”), ranges (“50000-50025”), or a combination of both (“50010,50011,50020-50025,50050-50075”).
(Hadoop is using a similar mechanism, there the configuration parameter is called
This section briefly describes how Flink and YARN interact.
The YARN client needs to access the Hadoop configuration to connect to the YARN resource manager and to HDFS. It determines the Hadoop configuration using the following strategy:
HADOOP_CONF_PATHare set (in that order). If one of these variables are set, they are used to read the configuration.
HADOOP_HOMEenvironment variable. If it is set, the client tries to access
$HADOOP_HOME/etc/hadoop(Hadoop 2) and
When starting a new Flink YARN session, the client first checks if the requested resources (containers and memory) are available. After that, it uploads a jar that contains Flink and the configuration to HDFS (step 1).
The next step of the client is to request (step 2) a YARN container to start the ApplicationMaster (step 3). Since the client registered the configuration and jar-file as a resource for the container, the NodeManager of YARN running on that particular machine will take care of preparing the container (e.g. downloading the files). Once that has finished, the ApplicationMaster (AM) is started.
The JobManager and AM are running in the same container. Once they successfully started, the AM knows the address of the JobManager (its own host). It is generating a new Flink configuration file for the TaskManagers (so that they can connect to the JobManager). The file is also uploaded to HDFS. Additionally, the AM container is also serving Flink’s web interface. The ports Flink is using for its services are the standard ports configured by the user + the application id as an offset. This allows users to execute multiple Flink YARN sessions in parallel.
After that, the AM starts allocating the containers for Flink’s TaskManagers, which will download the jar file and the modified configuration from the HDFS. Once these steps are completed, Flink is set up and ready to accept Jobs.