This documentation is for an unreleased version of Apache Flink Machine Learning Library. We recommend you use the latest stable version.

Quick Start #

This document provides a quick introduction to using Flink ML. Readers of this document will be guided to submit a simple Flink job that trains a Machine Learning Model and uses it to provide prediction service.

Help, I’m Stuck! #

If you get stuck, check out the community support resources. In particular, Apache Flink’s user mailing list is consistently ranked as one of the most active of any Apache project and a great way to get help quickly.

Prerequisites #

Make sure Java 8 or a higher version has been installed in your local machine. To check the Java version installed, type in your terminal:

$ java -version

Download Flink #

Download Flink 1.17, then extract the archive:

$ tar -xzf flink-*.tgz

Set Up Flink Environment Variables #

Run the following commands after having downloaded Flink:

cd ${path_to_flink}
export FLINK_HOME=`pwd`

Add Flink ML library to Flink’s library folder #

You need to copy Flink ML’s library files to Flink’s folder for proper initialization.

Please walk through this guideline to build Flink ML’s Java SDK. After that, you may copy the generated library files to Flink’s folder with the following commands.

cd ${path_to_flink_ml}
cp ./flink-ml-dist/target/flink-ml-*-bin/flink-ml*/lib/*.jar $FLINK_HOME/lib/

Run Flink ML example job #

Please start a Flink standalone cluster in your local environment with the following command.

$FLINK_HOME/bin/start-cluster.sh

You should be able to navigate to the web UI at localhost:8081 to view the Flink dashboard and see that the cluster is up and running.

Then you may submit Flink ML examples to the cluster as follows.

$FLINK_HOME/bin/flink run -c org.apache.flink.ml.examples.clustering.KMeansExample $FLINK_HOME/lib/flink-ml-examples*.jar

The command above would submit and execute Flink ML’s KMeansExample job. There are also example jobs for other Flink ML algorithms, and you can find them in flink-ml-examples module.

A sample output in your terminal is as follows.

Features: [9.0, 0.0]    Cluster ID: 1
Features: [0.3, 0.0]    Cluster ID: 0
Features: [0.0, 0.3]    Cluster ID: 0
Features: [9.6, 0.0]    Cluster ID: 1
Features: [0.0, 0.0]    Cluster ID: 0
Features: [9.0, 0.6]    Cluster ID: 1

Now you have successfully run a Flink ML job.

Finally, you can stop the Flink standalone cluster with the following command.

$FLINK_HOME/bin/stop-cluster.sh