pyflink.datastream.connectors.kafka.FlinkKafkaConsumer#
- class FlinkKafkaConsumer(topics: Union[str, List[str]], deserialization_schema: pyflink.common.serialization.DeserializationSchema, properties: Dict)[source]#
The Flink Kafka Consumer is a streaming data source that pulls a parallel data stream from Apache Kafka. The consumer can run in multiple parallel instances, each of which will pull data from one or more Kafka partitions.
The Flink Kafka Consumer participates in checkpointing and guarantees that no data is lost during a failure, and that the computation processes elements ‘exactly once. (These guarantees naturally assume that Kafka itself does not lose any data.)
Please note that Flink snapshots the offsets internally as part of its distributed checkpoints. The offsets committed to Kafka / Zookeeper are only to bring the outside view of progress in sync with Flink’s view of the progress. That way, monitoring and other jobs can get a view of how far the Flink Kafka consumer has consumed a topic.
Please refer to Kafka’s documentation for the available configuration properties: http://kafka.apache.org/documentation.html#newconsumerconfigs
Methods
disable_filter_restored_partitions_with_subscribed_topics
()By default, when restoring from a checkpoint / savepoint, the consumer always ignores restored partitions that are no longer associated with the current specified topics or topic pattern to subscribe to.
get_java_function
()get_produced_type
()set_commit_offsets_on_checkpoints
(...)Specifies whether or not the consumer should commit offsets back to kafka on checkpoints.
set_start_from_earliest
()Specifies the consumer to start reading from the earliest offset for all partitions.
set_start_from_group_offsets
()Specifies the consumer to start reading from any committed group offsets found in Zookeeper/ Kafka brokers.
set_start_from_latest
()Specifies the consuer to start reading from the latest offset for all partitions.
set_start_from_timestamp
(...)Specifies the consumer to start reading partitions from a specified timestamp.