Quick Start
This documentation is for an unreleased version of Apache Flink Machine Learning Library. We recommend you use the latest stable version.

Quick Start #

This document provides a quick introduction to using Flink ML. Readers of this document will be guided to submit a simple Flink job that trains a Machine Learning Model and use it to provide prediction service.

Prerequisites #

Python version (3.6, 3.7, or 3.8) is required for Flink ML. Please run the following command to make sure that it meets the requirements:

$ python --version
# the version printed here must be 3.6, 3.7 or 3.8

Flink ML Python SDK is available in PyPi and can be installed as follows:

$ python -m pip install apache-flink-ml

You can also build Flink ML Python SDK from sources by following the development guide.

After setting up Flink ML Python SDK, you can run a Flink ML example job as follows.

$ python -m pyflink.examples.ml.clustering.kmeans_example

The command above would create a Flink mini-cluster and execute Flink ML’s kmeans_example job. There are also example jobs for other Flink ML algorithms, and you can find them in pyflink.ml.examples module.

A sample output in your terminal is as follows.

Features: [9.6,0.0]     Cluster Id: 0
Features: [9.0,0.6]     Cluster Id: 0
Features: [0.0,0.3]     Cluster Id: 1
Features: [0.0,0.0]     Cluster Id: 1
Features: [0.3,3.0]     Cluster Id: 1
Features: [9.0,0.0]     Cluster Id: 0

Now you have successfully run a Flink ML job.