A few basic data sources and sinks are built into Flink and are always available. The predefined data sources include reading from files, directories, and sockets, and ingesting data from collections and iterators. The predefined data sinks support writing to files, to stdout and stderr, and to sockets.
Connectors provide code for interfacing with various third-party systems. Currently these systems are supported:
Keep in mind that to use one of these connectors in an application, additional third party components are usually required, e.g. servers for the data stores or message queues. Note also that while the streaming connectors listed in this section are part of the Flink project and are included in source releases, they are not included in the binary distributions. Further instructions can be found in the corresponding subsections.
Additional streaming connectors for Flink are being released through Apache Bahir, including:
Using a connector isn’t the only way to get data in and out of Flink.
One common pattern is to query an external database or web service in a Map
or FlatMap
in order to enrich the primary datastream.
Flink offers an API for Asynchronous I/O
to make it easier to do this kind of enrichment efficiently and robustly.
When a Flink application pushes a lot of data to an external data store, this can become an I/O bottleneck. If the data involved has many fewer reads than writes, a better approach can be for an external application to pull from Flink the data it needs. The Queryable State interface enables this by allowing the state being managed by Flink to be queried on demand.