DataStream Connectors #
Predefined Sources and Sinks #
A few basic data sources and sinks are built into Flink and are always available. The predefined data sources include reading from files, directories, and sockets, and ingesting data from collections and iterators. The predefined data sinks support writing to files, to stdout and stderr, and to sockets.
Flink Project Connectors #
Connectors provide code for interfacing with various third-party systems. Currently these systems are supported as part of the Apache Flink project:
- Apache Kafka (source/sink)
- Apache Cassandra (source/sink)
- Amazon DynamoDB (sink)
- Amazon Kinesis Data Streams (source/sink)
- Amazon Kinesis Data Firehose (sink)
- DataGen (source)
- Elasticsearch (sink)
- Opensearch (sink)
- FileSystem (source/sink)
- RabbitMQ (source/sink)
- Google PubSub (source/sink)
- Hybrid Source (source)
- Apache Pulsar (source)
- JDBC (sink)
- MongoDB (source/sink)
Keep in mind that to use one of these connectors in an application, additional third party components are usually required, e.g. servers for the data stores or message queues. Note also that while the streaming connectors listed in this section are part of the Flink project and are included in source releases, they are not included in the binary distributions. Further instructions can be found in the corresponding subsections.
Connectors in Apache Bahir #
Additional streaming connectors for Flink are being released through Apache Bahir, including:
- Apache ActiveMQ (source/sink)
- Apache Flume (sink)
- Redis (sink)
- Akka (sink)
- Netty (source)
Other Ways to Connect to Flink #
Data Enrichment via Async I/O #
Using a connector isn’t the only way to get data in and out of Flink.
One common pattern is to query an external database or web service in a Map
or FlatMap
in order to enrich the primary datastream.
Flink offers an API for Asynchronous I/O
to make it easier to do this kind of enrichment efficiently and robustly.