@PublicEvolving public class FlinkKafkaProducer<IN> extends TwoPhaseCommitSinkFunction<IN,org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer.KafkaTransactionState,FlinkKafkaProducer.KafkaTransactionContext>
FlinkKafkaProducer.Semantic.AT_LEAST_ONCE
semantic.
Before using FlinkKafkaProducer.Semantic.EXACTLY_ONCE
please refer to Flink's
Kafka connector documentation.Modifier and Type | Class and Description |
---|---|
static class |
FlinkKafkaProducer.ContextStateSerializer
|
static class |
FlinkKafkaProducer.KafkaTransactionContext
Context associated to this instance of the
FlinkKafkaProducer . |
static class |
FlinkKafkaProducer.NextTransactionalIdHint
Keep information required to deduce next safe to use transactional id.
|
static class |
FlinkKafkaProducer.Semantic
Semantics that can be chosen.
|
static class |
FlinkKafkaProducer.TransactionStateSerializer
TypeSerializer for
FlinkKafkaProducer.KafkaTransactionState . |
TwoPhaseCommitSinkFunction.State<TXN,CONTEXT>, TwoPhaseCommitSinkFunction.StateSerializer<TXN,CONTEXT>, TwoPhaseCommitSinkFunction.StateSerializerConfigSnapshot<TXN,CONTEXT>, TwoPhaseCommitSinkFunction.StateSerializerSnapshot<TXN,CONTEXT>, TwoPhaseCommitSinkFunction.TransactionHolder<TXN>
SinkFunction.Context<T>
Modifier and Type | Field and Description |
---|---|
static int |
DEFAULT_KAFKA_PRODUCERS_POOL_SIZE
Default number of KafkaProducers in the pool.
|
static Time |
DEFAULT_KAFKA_TRANSACTION_TIMEOUT
Default value for kafka transaction timeout.
|
static String |
KEY_DISABLE_METRICS
Configuration key for disabling the metrics reporting.
|
protected Properties |
producerConfig
User defined properties for the Producer.
|
static int |
SAFE_SCALE_DOWN_FACTOR
This coefficient determines what is the safe scale down factor.
|
protected FlinkKafkaProducer.Semantic |
semantic
Semantic chosen for this instance.
|
pendingCommitTransactions, state, userContext
Constructor and Description |
---|
FlinkKafkaProducer(String topicId,
KeyedSerializationSchema<IN> serializationSchema,
Properties producerConfig)
Creates a FlinkKafkaProducer for a given topic.
|
FlinkKafkaProducer(String topicId,
KeyedSerializationSchema<IN> serializationSchema,
Properties producerConfig,
FlinkKafkaProducer.Semantic semantic)
Creates a FlinkKafkaProducer for a given topic.
|
FlinkKafkaProducer(String defaultTopicId,
KeyedSerializationSchema<IN> serializationSchema,
Properties producerConfig,
Optional<FlinkKafkaPartitioner<IN>> customPartitioner)
Creates a FlinkKafkaProducer for a given topic.
|
FlinkKafkaProducer(String defaultTopicId,
KeyedSerializationSchema<IN> serializationSchema,
Properties producerConfig,
Optional<FlinkKafkaPartitioner<IN>> customPartitioner,
FlinkKafkaProducer.Semantic semantic,
int kafkaProducersPoolSize)
Creates a FlinkKafkaProducer for a given topic.
|
FlinkKafkaProducer(String topicId,
SerializationSchema<IN> serializationSchema,
Properties producerConfig)
Creates a FlinkKafkaProducer for a given topic.
|
FlinkKafkaProducer(String topicId,
SerializationSchema<IN> serializationSchema,
Properties producerConfig,
Optional<FlinkKafkaPartitioner<IN>> customPartitioner)
Creates a FlinkKafkaProducer for a given topic.
|
FlinkKafkaProducer(String brokerList,
String topicId,
KeyedSerializationSchema<IN> serializationSchema)
Creates a FlinkKafkaProducer for a given topic.
|
FlinkKafkaProducer(String brokerList,
String topicId,
SerializationSchema<IN> serializationSchema)
Creates a FlinkKafkaProducer for a given topic.
|
Modifier and Type | Method and Description |
---|---|
protected void |
abort(org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer.KafkaTransactionState transaction)
Abort a transaction.
|
protected org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer.KafkaTransactionState |
beginTransaction()
Method that starts a new transaction.
|
void |
close()
Tear-down method for the user code.
|
protected void |
commit(org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer.KafkaTransactionState transaction)
Commit a pre-committed transaction.
|
protected FlinkKafkaInternalProducer<byte[],byte[]> |
createProducer() |
protected void |
finishRecoveringContext() |
FlinkKafkaProducer<IN> |
ignoreFailuresAfterTransactionTimeout()
Disables the propagation of exceptions thrown when committing presumably timed out Kafka
transactions during recovery of the job.
|
void |
initializeState(FunctionInitializationContext context)
This method is called when the parallel function instance is created during distributed
execution.
|
protected Optional<FlinkKafkaProducer.KafkaTransactionContext> |
initializeUserContext() |
void |
invoke(org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer.KafkaTransactionState transaction,
IN next,
SinkFunction.Context context)
Write value within a transaction.
|
void |
open(Configuration configuration)
Initializes the connection to Kafka.
|
protected void |
preCommit(org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer.KafkaTransactionState transaction)
Pre commit previously created transaction.
|
protected void |
recoverAndAbort(org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer.KafkaTransactionState transaction)
Abort a transaction that was rejected by a coordinator after a failure.
|
protected void |
recoverAndCommit(org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer.KafkaTransactionState transaction)
Invoked on recovered transactions after a failure.
|
void |
setLogFailuresOnly(boolean logFailuresOnly)
Defines whether the producer should fail on errors, or only log them.
|
void |
setWriteTimestampToKafka(boolean writeTimestampToKafka)
If set to true, Flink will write the (event time) timestamp attached to each record into Kafka.
|
void |
snapshotState(FunctionSnapshotContext context)
This method is called when a snapshot for a checkpoint is requested.
|
currentTransaction, enableTransactionTimeoutWarnings, getUserContext, invoke, invoke, notifyCheckpointComplete, pendingTransactions, setTransactionTimeout
getIterationRuntimeContext, getRuntimeContext, setRuntimeContext
public static final int SAFE_SCALE_DOWN_FACTOR
If the Flink application previously failed before first checkpoint completed or we are starting new batch
of FlinkKafkaProducer
from scratch without clean shutdown of the previous one,
FlinkKafkaProducer
doesn't know what was the set of previously used Kafka's transactionalId's. In
that case, it will try to play safe and abort all of the possible transactionalIds from the range of:
[0, getNumberOfParallelSubtasks() * kafkaProducersPoolSize * SAFE_SCALE_DOWN_FACTOR)
The range of available to use transactional ids is:
[0, getNumberOfParallelSubtasks() * kafkaProducersPoolSize)
This means that if we decrease getNumberOfParallelSubtasks()
by a factor larger than
SAFE_SCALE_DOWN_FACTOR
we can have a left some lingering transaction.
public static final int DEFAULT_KAFKA_PRODUCERS_POOL_SIZE
FlinkKafkaProducer.Semantic.EXACTLY_ONCE
.public static final Time DEFAULT_KAFKA_TRANSACTION_TIMEOUT
public static final String KEY_DISABLE_METRICS
protected final Properties producerConfig
protected FlinkKafkaProducer.Semantic semantic
public FlinkKafkaProducer(String brokerList, String topicId, SerializationSchema<IN> serializationSchema)
brokerList
- Comma separated addresses of the brokerstopicId
- ID of the Kafka topic.serializationSchema
- User defined (keyless) serialization schema.public FlinkKafkaProducer(String topicId, SerializationSchema<IN> serializationSchema, Properties producerConfig)
Using this constructor, the default FlinkFixedPartitioner
will be used as
the partitioner. This default partitioner maps each sink subtask to a single Kafka
partition (i.e. all records received by a sink subtask will end up in the same
Kafka partition).
To use a custom partitioner, please use
FlinkKafkaProducer(String, SerializationSchema, Properties, Optional)
instead.
topicId
- ID of the Kafka topic.serializationSchema
- User defined key-less serialization schema.producerConfig
- Properties with the producer configuration.public FlinkKafkaProducer(String topicId, SerializationSchema<IN> serializationSchema, Properties producerConfig, Optional<FlinkKafkaPartitioner<IN>> customPartitioner)
SerializationSchema
and possibly a custom FlinkKafkaPartitioner
.
Since a key-less SerializationSchema
is used, all records sent to Kafka will not have an
attached key. Therefore, if a partitioner is also not provided, records will be distributed to Kafka
partitions in a round-robin fashion.
topicId
- The topic to write data toserializationSchema
- A key-less serializable serialization schema for turning user objects into a kafka-consumable byte[]producerConfig
- Configuration properties for the KafkaProducer. 'bootstrap.servers.' is the only required argument.customPartitioner
- A serializable partitioner for assigning messages to Kafka partitions.
If a partitioner is not provided, records will be distributed to Kafka partitions
in a round-robin fashion.public FlinkKafkaProducer(String brokerList, String topicId, KeyedSerializationSchema<IN> serializationSchema)
Using this constructor, the default FlinkFixedPartitioner
will be used as
the partitioner. This default partitioner maps each sink subtask to a single Kafka
partition (i.e. all records received by a sink subtask will end up in the same
Kafka partition).
To use a custom partitioner, please use
FlinkKafkaProducer(String, KeyedSerializationSchema, Properties, Optional)
instead.
brokerList
- Comma separated addresses of the brokerstopicId
- ID of the Kafka topic.serializationSchema
- User defined serialization schema supporting key/value messagespublic FlinkKafkaProducer(String topicId, KeyedSerializationSchema<IN> serializationSchema, Properties producerConfig)
Using this constructor, the default FlinkFixedPartitioner
will be used as
the partitioner. This default partitioner maps each sink subtask to a single Kafka
partition (i.e. all records received by a sink subtask will end up in the same
Kafka partition).
To use a custom partitioner, please use
FlinkKafkaProducer(String, KeyedSerializationSchema, Properties, Optional)
instead.
topicId
- ID of the Kafka topic.serializationSchema
- User defined serialization schema supporting key/value messagesproducerConfig
- Properties with the producer configuration.public FlinkKafkaProducer(String topicId, KeyedSerializationSchema<IN> serializationSchema, Properties producerConfig, FlinkKafkaProducer.Semantic semantic)
Using this constructor, the default FlinkFixedPartitioner
will be used as
the partitioner. This default partitioner maps each sink subtask to a single Kafka
partition (i.e. all records received by a sink subtask will end up in the same
Kafka partition).
To use a custom partitioner, please use
FlinkKafkaProducer(String, KeyedSerializationSchema, Properties, Optional, FlinkKafkaProducer.Semantic, int)
instead.
topicId
- ID of the Kafka topic.serializationSchema
- User defined serialization schema supporting key/value messagesproducerConfig
- Properties with the producer configuration.semantic
- Defines semantic that will be used by this producer (see FlinkKafkaProducer.Semantic
).public FlinkKafkaProducer(String defaultTopicId, KeyedSerializationSchema<IN> serializationSchema, Properties producerConfig, Optional<FlinkKafkaPartitioner<IN>> customPartitioner)
KeyedSerializationSchema
and possibly a custom FlinkKafkaPartitioner
.
If a partitioner is not provided, written records will be partitioned by the attached key of each
record (as determined by KeyedSerializationSchema.serializeKey(Object)
). If written records do not
have a key (i.e., KeyedSerializationSchema.serializeKey(Object)
returns null
), they
will be distributed to Kafka partitions in a round-robin fashion.
defaultTopicId
- The default topic to write data toserializationSchema
- A serializable serialization schema for turning user objects into a kafka-consumable byte[] supporting key/value messagesproducerConfig
- Configuration properties for the KafkaProducer. 'bootstrap.servers.' is the only required argument.customPartitioner
- A serializable partitioner for assigning messages to Kafka partitions.
If a partitioner is not provided, records will be partitioned by the key of each record
(determined by KeyedSerializationSchema.serializeKey(Object)
). If the keys
are null
, then records will be distributed to Kafka partitions in a
round-robin fashion.public FlinkKafkaProducer(String defaultTopicId, KeyedSerializationSchema<IN> serializationSchema, Properties producerConfig, Optional<FlinkKafkaPartitioner<IN>> customPartitioner, FlinkKafkaProducer.Semantic semantic, int kafkaProducersPoolSize)
KeyedSerializationSchema
and possibly a custom FlinkKafkaPartitioner
.
If a partitioner is not provided, written records will be partitioned by the attached key of each
record (as determined by KeyedSerializationSchema.serializeKey(Object)
). If written records do not
have a key (i.e., KeyedSerializationSchema.serializeKey(Object)
returns null
), they
will be distributed to Kafka partitions in a round-robin fashion.
defaultTopicId
- The default topic to write data toserializationSchema
- A serializable serialization schema for turning user objects into a kafka-consumable byte[] supporting key/value messagesproducerConfig
- Configuration properties for the KafkaProducer. 'bootstrap.servers.' is the only required argument.customPartitioner
- A serializable partitioner for assigning messages to Kafka partitions.
If a partitioner is not provided, records will be partitioned by the key of each record
(determined by KeyedSerializationSchema.serializeKey(Object)
). If the keys
are null
, then records will be distributed to Kafka partitions in a
round-robin fashion.semantic
- Defines semantic that will be used by this producer (see FlinkKafkaProducer.Semantic
).kafkaProducersPoolSize
- Overwrite default KafkaProducers pool size (see FlinkKafkaProducer.Semantic.EXACTLY_ONCE
).public void setWriteTimestampToKafka(boolean writeTimestampToKafka)
writeTimestampToKafka
- Flag indicating if Flink's internal timestamps are written to Kafka.public void setLogFailuresOnly(boolean logFailuresOnly)
logFailuresOnly
- The flag to indicate logging-only on exceptions.public FlinkKafkaProducer<IN> ignoreFailuresAfterTransactionTimeout()
Note that we use System.currentTimeMillis()
to track the age of a transaction.
Moreover, only exceptions thrown during the recovery are caught, i.e., the producer will
attempt at least one commit of the transaction before giving up.
ignoreFailuresAfterTransactionTimeout
in class TwoPhaseCommitSinkFunction<IN,org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer.KafkaTransactionState,FlinkKafkaProducer.KafkaTransactionContext>
public void open(Configuration configuration) throws Exception
open
in interface RichFunction
open
in class AbstractRichFunction
configuration
- The configuration containing the parameters attached to the contract.Exception
- Implementations may forward exceptions, which are caught by the runtime. When the
runtime catches an exception, it aborts the task and lets the fail-over logic
decide whether to retry the task execution.Configuration
public void invoke(org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer.KafkaTransactionState transaction, IN next, SinkFunction.Context context) throws FlinkKafkaException
TwoPhaseCommitSinkFunction
invoke
in class TwoPhaseCommitSinkFunction<IN,org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer.KafkaTransactionState,FlinkKafkaProducer.KafkaTransactionContext>
FlinkKafkaException
public void close() throws FlinkKafkaException
RichFunction
This method can be used for clean up work.
close
in interface RichFunction
close
in class TwoPhaseCommitSinkFunction<IN,org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer.KafkaTransactionState,FlinkKafkaProducer.KafkaTransactionContext>
FlinkKafkaException
protected org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer.KafkaTransactionState beginTransaction() throws FlinkKafkaException
TwoPhaseCommitSinkFunction
beginTransaction
in class TwoPhaseCommitSinkFunction<IN,org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer.KafkaTransactionState,FlinkKafkaProducer.KafkaTransactionContext>
FlinkKafkaException
protected void preCommit(org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer.KafkaTransactionState transaction) throws FlinkKafkaException
TwoPhaseCommitSinkFunction
Usually implementation involves flushing the data.
preCommit
in class TwoPhaseCommitSinkFunction<IN,org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer.KafkaTransactionState,FlinkKafkaProducer.KafkaTransactionContext>
FlinkKafkaException
protected void commit(org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer.KafkaTransactionState transaction)
TwoPhaseCommitSinkFunction
TwoPhaseCommitSinkFunction.recoverAndCommit(Object)
will be called again for the
same transaction.commit
in class TwoPhaseCommitSinkFunction<IN,org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer.KafkaTransactionState,FlinkKafkaProducer.KafkaTransactionContext>
protected void recoverAndCommit(org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer.KafkaTransactionState transaction)
TwoPhaseCommitSinkFunction
recoverAndCommit
in class TwoPhaseCommitSinkFunction<IN,org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer.KafkaTransactionState,FlinkKafkaProducer.KafkaTransactionContext>
protected void abort(org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer.KafkaTransactionState transaction)
TwoPhaseCommitSinkFunction
abort
in class TwoPhaseCommitSinkFunction<IN,org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer.KafkaTransactionState,FlinkKafkaProducer.KafkaTransactionContext>
protected void recoverAndAbort(org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer.KafkaTransactionState transaction)
TwoPhaseCommitSinkFunction
recoverAndAbort
in class TwoPhaseCommitSinkFunction<IN,org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer.KafkaTransactionState,FlinkKafkaProducer.KafkaTransactionContext>
public void snapshotState(FunctionSnapshotContext context) throws Exception
CheckpointedFunction
FunctionInitializationContext
when
the Function was initialized, or offered now by FunctionSnapshotContext
itself.snapshotState
in interface CheckpointedFunction
snapshotState
in class TwoPhaseCommitSinkFunction<IN,org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer.KafkaTransactionState,FlinkKafkaProducer.KafkaTransactionContext>
context
- the context for drawing a snapshot of the operatorException
public void initializeState(FunctionInitializationContext context) throws Exception
CheckpointedFunction
initializeState
in interface CheckpointedFunction
initializeState
in class TwoPhaseCommitSinkFunction<IN,org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer.KafkaTransactionState,FlinkKafkaProducer.KafkaTransactionContext>
context
- the context for initializing the operatorException
protected Optional<FlinkKafkaProducer.KafkaTransactionContext> initializeUserContext()
initializeUserContext
in class TwoPhaseCommitSinkFunction<IN,org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer.KafkaTransactionState,FlinkKafkaProducer.KafkaTransactionContext>
protected void finishRecoveringContext()
finishRecoveringContext
in class TwoPhaseCommitSinkFunction<IN,org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer.KafkaTransactionState,FlinkKafkaProducer.KafkaTransactionContext>
protected FlinkKafkaInternalProducer<byte[],byte[]> createProducer()
Copyright © 2014–2020 The Apache Software Foundation. All rights reserved.