Quick Start

Quick Start #

This document provides a quick introduction to using Flink ML. Readers of this document will be guided to submit a simple Flink job that trains a Machine Learning Model and uses it to provide prediction service.

Help, I’m Stuck! #

If you get stuck, check out the community support resources. In particular, Apache Flink’s user mailing list is consistently ranked as one of the most active of any Apache project and a great way to get help quickly.

Prerequisites #

Make sure Java 8 or a higher version has been installed in your local machine. To check the Java version installed, type in your terminal:

$ java -version

Download 1.15 or a higher version of Flink, then extract the archive:

$ tar -xzf flink-*.tgz

After having downloaded Flink, please register $FLINK_HOME as an environment variable into your local environment.

cd ${path_to_flink}
export FLINK_HOME=`pwd`

You need to copy Flink ML’s library files to Flink’s folder for proper initialization.

Please download the correponding binary release of Flink ML, then extract the archive:

tar -xzf flink-ml-*.tgz

Then you may copy the extracted library files to Flink’s folder with the following commands.

cd ${path_to_flink_ml}
cp ./lib/*.jar $FLINK_HOME/lib/

Please start a Flink standalone cluster in your local environment with the following command.

$FLINK_HOME/bin/start-cluster.sh

You should be able to navigate to the web UI at localhost:8081 to view the Flink dashboard and see that the cluster is up and running.

Then you may submit Flink ML examples to the cluster as follows.

$FLINK_HOME/bin/flink run -c org.apache.flink.ml.examples.clustering.KMeansExample $FLINK_HOME/lib/flink-ml-examples*.jar

The command above would submit and execute Flink ML’s KMeansExample job. There are also example jobs for other Flink ML algorithms, and you can find them in flink-ml-examples module.

A sample output in your terminal is as follows.

Features: [9.0, 0.0]    Cluster ID: 1
Features: [0.3, 0.0]    Cluster ID: 0
Features: [0.0, 0.3]    Cluster ID: 0
Features: [9.6, 0.0]    Cluster ID: 1
Features: [0.0, 0.0]    Cluster ID: 0
Features: [9.0, 0.6]    Cluster ID: 1

Now you have successfully run a Flink ML job.