public class KMeans extends Object
KMeans is an iterative clustering algorithm and works as follows:
KMeans is given a set of data points to be clustered and an initial set of K cluster
centers. In each iteration, the algorithm computes the distance of each data point to each
cluster center. Each point is assigned to the cluster center which is closest to it.
Subsequently, each cluster center is moved to the center (mean) of all points that have
been assigned to it. The moved cluster centers are fed into the next iteration. The algorithm
terminates after a fixed number of iterations (as in this implementation) or if cluster centers
do not (significantly) move in an iteration.
This is the Wikipedia entry for the KMeans Clustering algorithm.
This implementation works on twodimensional data points.
It computes an assignment of data points to cluster centers, i.e., each data point is annotated
with the id of the final cluster (center) it belongs to.
Input files are plain text files and must be formatted as follows:
"1.2 2.3\n5.3 7.2\n"
gives two data points (x=1.2, y=2.3) and
(x=5.3, y=7.2).
"1 6.2 3.2\n2 2.9 5.7\n"
gives two centers (id=1, x=6.2, y=3.2)
and (id=2, x=2.9, y=5.7).
Usage:
KMeans points <path> centroids <path> output <path> iterations <n>
If no parameters are provided, the program is run with default data from KMeansData
and 10 iterations.
This example shows how to use:
Note: All Flink DataSet APIs are deprecated since Flink 1.18 and will be removed in a future Flink major version. You can still build your application in DataSet, but you should move to either the DataStream and/or Table API. This class is retained for testing purposes.
Modifier and Type  Class and Description 

static class 
KMeans.Centroid
A simple twodimensional centroid, basically a point with an ID.

static class 
KMeans.CentroidAccumulator
Sums and counts point coordinates.

static class 
KMeans.CentroidAverager
Computes new centroid from coordinate sum and count of points.

static class 
KMeans.CountAppender
Appends a count variable to the tuple.

static class 
KMeans.Point
A simple twodimensional point.

static class 
KMeans.SelectNearestCenter
Determines the closest cluster center for a data point.

Constructor and Description 

KMeans() 
Copyright © 2014–2024 The Apache Software Foundation. All rights reserved.