@PublicEvolving public interface DataStreamSinkProvider extends DynamicTableSink.SinkRuntimeProvider, ParallelismProvider
DataStream
as a runtime implementation for DynamicTableSink
.
Note: This provider is only meant for advanced connector developers. Usually, a sink should
consist of a single entity expressed via SinkProvider
, SinkFunctionProvider
, or
OutputFormatProvider
. When using a DataStream
an implementer needs to pay
attention to how changes are shuffled to not mess up the changelog per parallel subtask.
Modifier and Type | Method and Description |
---|---|
default DataStreamSink<?> |
consumeDataStream(DataStream<RowData> dataStream)
Deprecated.
Use
consumeDataStream(ProviderContext, DataStream)
and correctly set a unique identifier for each data stream transformation. |
default DataStreamSink<?> |
consumeDataStream(ProviderContext providerContext,
DataStream<RowData> dataStream)
Consumes the given Java
DataStream and returns the sink transformation DataStreamSink . |
default Optional<Integer> |
getParallelism()
Returns the parallelism for this instance.
|
default DataStreamSink<?> consumeDataStream(ProviderContext providerContext, DataStream<RowData> dataStream)
DataStream
and returns the sink transformation DataStreamSink
.
Note: If the CompiledPlan
feature should be supported, this method MUST set a
unique identifier for each transformation/operator in the data stream. This enables stateful
Flink version upgrades for streaming jobs. The identifier is used to map state back from a
savepoint to an actual operator in the topology. The framework can generate topology-wide
unique identifiers with ProviderContext.generateUid(String)
.
SingleOutputStreamOperator.uid(String)
@Deprecated default DataStreamSink<?> consumeDataStream(DataStream<RowData> dataStream)
consumeDataStream(ProviderContext, DataStream)
and correctly set a unique identifier for each data stream transformation.DataStream
and returns the sink transformation DataStreamSink
.default Optional<Integer> getParallelism()
The parallelism denotes how many parallel instances of a source or sink will be spawned during the execution.
Enforcing a different parallelism for sources/sinks might mess up the changelog if the
output/input is not ChangelogMode.insertOnly()
. Therefore, a primary key is required
by which the output/input will be shuffled after/before records leave/enter the ScanTableSource.ScanRuntimeProvider
/DynamicTableSink.SinkRuntimeProvider
implementation.
Note: If a custom parallelism is returned and consumeDataStream(ProviderContext,
DataStream)
applies multiple transformations, make sure to set the same custom parallelism
to each operator to not mess up the changelog.
getParallelism
in interface ParallelismProvider
Copyright © 2014–2024 The Apache Software Foundation. All rights reserved.