Important: Maven artifacts which depend on Scala are now suffixed with the Scala major version, e.g. "2.10" or "2.11". Please consult the migration guide on the project Wiki.

Quickstart: Java API

This documentation is for an out-of-date version of Apache Flink. We recommend you use the latest stable version.

Requirements
Create Project
Inspect Project
Build Project
Next Steps

Start working on your Flink Java program in a few simple steps.

Requirements

The only requirements are working Maven 3.0.4 (or higher) and Java 7.x (or higher) installations.

Create Project

Use one of the following commands to create a project:

Use Maven archetypes
Run the quickstart script

$ mvn archetype:generate                               \
      -DarchetypeGroupId=org.apache.flink              \
      -DarchetypeArtifactId=flink-quickstart-java      \
      -DarchetypeVersion=1.1.5

This allows you to name your newly created project. It will interactively ask you for the groupId, artifactId, and package name.

$ curl https://flink.apache.org/q/quickstart.sh | bash

Inspect Project

There will be a new directory in your working directory. If you’ve used the curl approach, the directory is called quickstart. Otherwise, it has the name of your artifactId.

The sample project is a Maven project, which contains four classes. StreamingJob and BatchJob are basic skeleton programs, SocketTextStreamWordCount is a working streaming example and WordCountJob is a working batch example. Please note that the main method of all classes allow you to start Flink in a development/testing mode.

We recommend you import this project into your IDE to develop and test it. If you use Eclipse, the m2e plugin allows to import Maven projects. Some Eclipse bundles include that plugin by default, others require you to install it manually. The IntelliJ IDE also supports Maven projects out of the box.

A note to Mac OS X users: The default JVM heapsize for Java is too small for Flink. You have to manually increase it. Choose “Run Configurations” -> Arguments and write into the “VM Arguments” box: “-Xmx800m” in Eclipse.

Build Project

If you want to build your project, go to your project directory and issue the mvn clean install -Pbuild-jar command. You will find a jar that runs on every Flink cluster in target/your-artifact-id-1.0-SNAPSHOT.jar. There is also a fat-jar, target/your-artifact-id-1.0-SNAPSHOT-flink-fat-jar.jar. This also contains all dependencies that get added to the maven project.

Next Steps

Write your application!

The quickstart project contains a WordCount implementation, the “Hello World” of Big Data processing systems. The goal of WordCount is to determine the frequencies of words in a text, e.g., how often do the terms “the” or “house” occur in all Wikipedia texts.

Sample Input:

big data is big

Sample Output:

big 2
data 1
is 1

The following code shows the WordCount implementation from the Quickstart which processes some text lines with two operators (FlatMap and Reduce), and prints the resulting words and counts to std-out.

public class WordCount {
  
  public static void main(String[] args) throws Exception {
    
    // set up the execution environment
    final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
    
    // get input data
    DataSet<String> text = env.fromElements(
        "To be, or not to be,--that is the question:--",
        "Whether 'tis nobler in the mind to suffer",
        "The slings and arrows of outrageous fortune",
        "Or to take arms against a sea of troubles,"
        );
    
    DataSet<Tuple2<String, Integer>> counts = 
        // split up the lines in pairs (2-tuples) containing: (word,1)
        text.flatMap(new LineSplitter())
        // group by the tuple field "0" and sum up tuple field "1"
        .groupBy(0)
        .aggregate(Aggregations.SUM, 1);

    // emit result
    counts.print();
  }
}

The operations are defined by specialized classes, here the LineSplitter class.

public class LineSplitter implements FlatMapFunction<String, Tuple2<String, Integer>> {

  @Override
  public void flatMap(String value, Collector<Tuple2<String, Integer>> out) {
    // normalize and split the line into words
    String[] tokens = value.toLowerCase().split("\\W+");
    
    // emit the pairs
    for (String token : tokens) {
      if (token.length() > 0) {
        out.collect(new Tuple2<String, Integer>(token, 1));
      }
    }
  }
}

Check GitHub for the full example code.

For a complete overview over our API, have a look at the Programming Guide and further example programs. If you have any trouble, ask on our Mailing List. We are happy to provide help.