Quick Start

Quick Start #

This document provides a quick introduction to using Flink ML. Readers of this document will be guided to submit a simple Flink job that trains a Machine Learning Model and uses it to provide prediction service.

Help, I’m Stuck! #

If you get stuck, check out the community support resources. In particular, Apache Flink’s user mailing list is consistently ranked as one of the most active of any Apache project and a great way to get help quickly.

Prerequisites #

Make sure Java 8 or a higher version has been installed in your local machine. To check the Java version installed, type in your terminal:

$ java -version

Download Flink 1.17, then extract the archive:

$ tar -xzf flink-*.tgz

Run the following commands after having downloaded Flink:

cd ${path_to_flink}
export FLINK_HOME=`pwd`

You need to copy Flink ML’s library files to Flink’s folder for proper initialization.

Please download Flink ML Python source and extract the jar files into Flink’s library folder.

tar -xzf apache-flink-ml*.tar.gz
cp apache-flink-ml-*/deps/lib/* $FLINK_HOME/lib/

Please start a Flink standalone cluster in your local environment with the following command.

$FLINK_HOME/bin/start-cluster.sh

You should be able to navigate to the web UI at localhost:8081 to view the Flink dashboard and see that the cluster is up and running.

Then you may submit Flink ML examples to the cluster as follows.

$FLINK_HOME/bin/flink run -c org.apache.flink.ml.examples.clustering.KMeansExample $FLINK_HOME/lib/flink-ml-examples*.jar

The command above would submit and execute Flink ML’s KMeansExample job. There are also example jobs for other Flink ML algorithms, and you can find them in flink-ml-examples module.

A sample output in your terminal is as follows.

Features: [9.0, 0.0]    Cluster ID: 1
Features: [0.3, 0.0]    Cluster ID: 0
Features: [0.0, 0.3]    Cluster ID: 0
Features: [9.6, 0.0]    Cluster ID: 1
Features: [0.0, 0.0]    Cluster ID: 0
Features: [9.0, 0.6]    Cluster ID: 1

Now you have successfully run a Flink ML job.

Finally, you can stop the Flink standalone cluster with the following command.

$FLINK_HOME/bin/stop-cluster.sh