Cluster Execution #
Flink programs can run distributed on clusters of many machines. There are two ways to send a program to a cluster for execution:
Command Line Interface #
The command line interface lets you submit packaged programs (JARs) to a cluster (or single machine setup).
Please refer to the Command Line Interface documentation for details.
Remote Environment #
The remote environment lets you execute Flink Java programs on a cluster directly. The remote environment points to the cluster on which you want to execute the program.
Maven Dependency #
If you are developing your program as a Maven project, you have to add the
flink-clients
module using this dependency:
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-clients</artifactId>
<version>1.16.2</version>
</dependency>
Example #
The following illustrates the use of the RemoteEnvironment
:
public static void main(String[] args) throws Exception {
ExecutionEnvironment env = ExecutionEnvironment
.createRemoteEnvironment("flink-jobmanager", 8081, "/home/user/udfs.jar");
DataSet<String> data = env.readTextFile("hdfs://path/to/file");
data
.filter(new FilterFunction<String>() {
public boolean filter(String value) {
return value.startsWith("http://");
}
})
.writeAsText("hdfs://path/to/result");
env.execute();
}
Note that the program contains custom user code and hence requires a JAR file with the classes of the code attached. The constructor of the remote environment takes the path(s) to the JAR file(s).