public class ConnectedComponents extends Object
Initially, the algorithm assigns each vertex an unique ID. In each step, a vertex picks the minimum of its own ID and its neighbors' IDs, as its new ID and tells its neighbors about its new ID. After the algorithm has completed, all vertices in the same component will have the same ID.
A vertex whose component ID did not change needs not propagate its information in the next
step. Because of that, the algorithm is easily expressible via a delta iteration. We here model
the solution set as the vertices with their current component ids, and the workset as the changed
vertices. Because we see all vertices initially as changed, the initial workset and the initial
solution set are identical. Also, the delta to the solution set is consequently also the next
workset.
Input files are plain text files and must be formatted as follows:
"1\n2\n12\n42\n63"
gives five vertices (1), (2), (12), (42), and
(63).
"1 2\n2 12\n1 12\n42 63"
gives four (undirected) edges (1)-(2),
(2)-(12), (1)-(12), and (42)-(63).
Usage:
ConnectedComponents --vertices <path> --edges <path> --output <path> --iterations <n>
If no parameters are provided, the program is run with default data from ConnectedComponentsData
and 10 iterations.
This example shows how to use:
Note: All Flink DataSet APIs are deprecated since Flink 1.18 and will be removed in a future Flink major version. You can still build your application in DataSet, but you should move to either the DataStream and/or Table API. This class is retained for testing purposes.
Modifier and Type | Class and Description |
---|---|
static class |
ConnectedComponents.ComponentIdFilter
Emit the candidate (Vertex-ID, Component-ID) pair if and only if the candidate component ID
is less than the vertex's current component ID.
|
static class |
ConnectedComponents.DuplicateValue<T>
Function that turns a value into a 2-tuple where both fields are that value.
|
static class |
ConnectedComponents.NeighborWithComponentIDJoin
UDF that joins a (Vertex-ID, Component-ID) pair that represents the current component that a
vertex is associated with, with a (Source-Vertex-ID, Target-VertexID) edge.
|
static class |
ConnectedComponents.UndirectEdge
Undirected edges by emitting for each input edge the input edges itself and an inverted
version.
|
Constructor and Description |
---|
ConnectedComponents() |
Copyright © 2014–2024 The Apache Software Foundation. All rights reserved.