Quick Start #
This document provides a quick introduction to using Flink ML. Readers of this document will be guided to submit a simple Flink job that trains a Machine Learning Model and uses it to provide prediction service.
Help, I’m Stuck! #
If you get stuck, check out the community support resources. In particular, Apache Flink’s user mailing list is consistently ranked as one of the most active of any Apache project and a great way to get help quickly.
Prerequisites #
Make sure Java 8 or a higher version has been installed in your local machine. To check the Java version installed, type in your terminal:
$ java -version
Download Flink #
Download Flink 1.17, then extract the archive:
$ tar -xzf flink-*.tgz
Set Up Flink Environment Variables #
Run the following commands after having downloaded Flink:
cd ${path_to_flink}
export FLINK_HOME=`pwd`
Add Flink ML library to Flink’s library folder #
You need to copy Flink ML’s library files to Flink’s folder for proper initialization.
Please download Flink ML Python source and extract the jar files into Flink’s library folder.
tar -xzf apache-flink-ml*.tar.gz
cp apache-flink-ml-*/deps/lib/* $FLINK_HOME/lib/
Run Flink ML example job #
Please start a Flink standalone cluster in your local environment with the following command.
$FLINK_HOME/bin/start-cluster.sh
You should be able to navigate to the web UI at localhost:8081 to view the Flink dashboard and see that the cluster is up and running.
Then you may submit Flink ML examples to the cluster as follows.
$FLINK_HOME/bin/flink run -c org.apache.flink.ml.examples.clustering.KMeansExample $FLINK_HOME/lib/flink-ml-examples*.jar
The command above would submit and execute Flink ML’s KMeansExample
job. There
are also example jobs for other Flink ML algorithms, and you can find them in
flink-ml-examples
module.
A sample output in your terminal is as follows.
Features: [9.0, 0.0] Cluster ID: 1
Features: [0.3, 0.0] Cluster ID: 0
Features: [0.0, 0.3] Cluster ID: 0
Features: [9.6, 0.0] Cluster ID: 1
Features: [0.0, 0.0] Cluster ID: 0
Features: [9.0, 0.6] Cluster ID: 1
Now you have successfully run a Flink ML job.
Finally, you can stop the Flink standalone cluster with the following command.
$FLINK_HOME/bin/stop-cluster.sh