This documentation is for an out-of-date version of Apache Flink. We recommend you use the latest stable version.
Stateful Functions offers an Apache Kafka I/O Module for reading from and writing to Kafka topics.
It is based on Apache Flink’s universal Kafka connector and provides exactly-once processing semantics.
The Kafka I/O Module is configurable in Yaml or Java.
To use the Kafka I/O Module in Java, please include the following dependency in your pom.
Kafka Ingress Spec
A KafkaIngressSpec declares an ingress spec for consuming from Kafka cluster.
It accepts the following arguments:
The ingress identifier associated with this ingress
The topic name / list of topic names
The address of the bootstrap servers
The consumer group id to use
A KafkaIngressDeserializer for deserializing data from Kafka (Java only)
The position to start consuming from
The ingress also accepts properties to directly configure the Kafka client, using KafkaIngressBuilder#withProperties(Properties).
Please refer to the Kafka consumer configuration documentation for the full list of available properties.
Note that configuration passed using named methods, such as KafkaIngressBuilder#withConsumerGroupId(String), will have higher precedence and overwrite their respective settings in the provided properties.
Startup Position
The ingress allows configuring the startup position to be one of the following:
From Group Offset (default)
Starts from offsets that were committed to Kafka for the specified consumer group.
Earlist
Starts from the earliest offset.
Latest
Starts from the latest offset.
Specific Offsets
Starts from specific offsets, defined as a map of partitions to their target starting offset.
Date
Starts from offsets that have an ingestion time larger than or equal to a specified date.
On startup, if the specified startup offset for a partition is out-of-range or does not exist (which may be the case if the ingress is configured to start from group offsets, specific offsets, or from a date), then the ingress will fallback to using the position configured using KafkaIngressBuilder#withAutoOffsetResetPosition(KafkaIngressAutoResetPosition).
By default, this is set to be the latest position.
Kafka Deserializer
When using the Java api, the Kafka ingress needs to know how to turn the binary data in Kafka into Java objects.
The KafkaIngressDeserializer allows users to specify such a schema.
The T deserialize(ConsumerRecord<byte[], byte[]> record) method gets called for each Kafka message, passing the key, value, and metadata from Kafka.
Kafka Egress Spec
A KafkaEgressBuilder declares an egress spec for writing data out to a Kafka cluster.
It accepts the following arguments:
The egress identifier associated with this egress
The address of the bootstrap servers
A KafkaEgressSerializer for serializing data into Kafka (Java only)
The fault tolerance semantic
Properties for the Kafka producer
Please refer to the Kafka producer configuration documentation for the full list of available properties.
Kafka Egress and Fault Tolerance
With fault tolerance enabled, the Kafka egress can provide exactly-once delivery guarantees.
You can choose three different modes of operation.
None
Nothing is guaranteed, produced records can be lost or duplicated.
At Least Once
Stateful Functions will guarantee that no records will be lost but they can be duplicated.
Exactly Once
Stateful Functions uses Kafka transactions to provide exactly-once semantics.
Kafka Serializer
When using the Java api, the Kafka egress needs to know how to turn Java objects into binary data.
The KafkaEgressSerializer allows users to specify such a schema.
The ProducerRecord<byte[], byte[]> serialize(T out) method gets called for each message, allowing users to set a key, value, and other metadata.