public class KeyedPartitionStreamImpl<K,V> extends AbstractDataStream<V> implements KeyedPartitionStream<K,V>
KeyedPartitionStream
.KeyedPartitionStream.ProcessConfigurableAndKeyedPartitionStream<K,T>, KeyedPartitionStream.TwoKeyedPartitionStreams<K,T1,T2>
environment, requestedSideOutputs, transformation
Constructor and Description |
---|
KeyedPartitionStreamImpl(AbstractDataStream<V> dataStream,
KeySelector<V,K> keySelector) |
KeyedPartitionStreamImpl(AbstractDataStream<V> dataStream,
KeySelector<V,K> keySelector,
TypeInformation<K> keyType) |
KeyedPartitionStreamImpl(AbstractDataStream<V> dataStream,
Transformation<V> partitionTransformation,
KeySelector<V,K> keySelector,
TypeInformation<K> keyType)
This can construct a keyed stream directly without partitionTransformation to avoid shuffle.
|
getEnvironment, getSideOutputTransform, getTransformation, getType
public KeyedPartitionStreamImpl(AbstractDataStream<V> dataStream, KeySelector<V,K> keySelector)
public KeyedPartitionStreamImpl(AbstractDataStream<V> dataStream, KeySelector<V,K> keySelector, TypeInformation<K> keyType)
public KeyedPartitionStreamImpl(AbstractDataStream<V> dataStream, Transformation<V> partitionTransformation, KeySelector<V,K> keySelector, TypeInformation<K> keyType)
public <OUT> NonKeyedPartitionStream.ProcessConfigurableAndNonKeyedPartitionStream<OUT> process(OneInputStreamProcessFunction<V,OUT> processFunction)
KeyedPartitionStream
KeyedPartitionStream
.
Generally, apply an operation to a KeyedPartitionStream
will result in a NonKeyedPartitionStream
, and you can manually generate a KeyedPartitionStream
via
keyBy partitioning. In some cases, you can guarantee that the partition on which the data is
processed will not change, then you can use KeyedPartitionStream.process(OneInputStreamProcessFunction,
KeySelector)
to avoid shuffling.
process
in interface KeyedPartitionStream<K,V>
processFunction
- to perform operation.NonKeyedPartitionStream
with this operation.public <OUT> KeyedPartitionStream.ProcessConfigurableAndKeyedPartitionStream<K,OUT> process(OneInputStreamProcessFunction<V,OUT> processFunction, KeySelector<OUT,K> newKeySelector)
KeyedPartitionStream
KeyedPartitionStream
.
This method is used to avoid shuffle after applying the process function. It is required
that for the same record, the new KeySelector
must extract the same key as the
original KeySelector
on this KeyedPartitionStream
. Otherwise, the partition
of data will be messy.
process
in interface KeyedPartitionStream<K,V>
processFunction
- to perform operation.newKeySelector
- to select the key after process.KeyedPartitionStream
with this operation.public <OUT1,OUT2> KeyedPartitionStream.TwoKeyedPartitionStreams<K,OUT1,OUT2> process(TwoOutputStreamProcessFunction<V,OUT1,OUT2> processFunction, KeySelector<OUT1,K> keySelector1, KeySelector<OUT2,K> keySelector2)
KeyedPartitionStream
KeyedPartitionStream
.
This method is used to avoid shuffle after applying the process function. It is required
that for the same record, these new two KeySelector
s must extract the same key as the
original KeySelector
s on this KeyedPartitionStream
. Otherwise, the partition
of data will be messy.
process
in interface KeyedPartitionStream<K,V>
processFunction
- to perform two output operation.keySelector1
- to select the key of first output.keySelector2
- to select the key of second output.KeyedPartitionStream.TwoKeyedPartitionStreams
with this operation.public <OUT1,OUT2> NonKeyedPartitionStream.TwoNonKeyedPartitionStreams<OUT1,OUT2> process(TwoOutputStreamProcessFunction<V,OUT1,OUT2> processFunction)
KeyedPartitionStream
KeyedPartitionStream
.process
in interface KeyedPartitionStream<K,V>
processFunction
- to perform two output operation.NonKeyedPartitionStream.TwoNonKeyedPartitionStreams
with this operation.public <T_OTHER,OUT> NonKeyedPartitionStream.ProcessConfigurableAndNonKeyedPartitionStream<OUT> connectAndProcess(KeyedPartitionStream<K,T_OTHER> other, TwoInputNonBroadcastStreamProcessFunction<V,T_OTHER,OUT> processFunction)
KeyedPartitionStream
KeyedPartitionStream
. The two keyed
streams must have the same partitions, otherwise it makes no sense to connect them.
Generally, concatenating two KeyedPartitionStream
will result in a NonKeyedPartitionStream
, and you can manually generate a KeyedPartitionStream
via
keyBy partitioning. In some cases, you can guarantee that the partition on which the data is
processed will not change, then you can use KeyedPartitionStream.connectAndProcess(KeyedPartitionStream,
TwoInputNonBroadcastStreamProcessFunction, KeySelector)
to avoid shuffling.
connectAndProcess
in interface KeyedPartitionStream<K,V>
other
- KeyedPartitionStream
to perform operation with two input.processFunction
- to perform operation.NonKeyedPartitionStream
with this operation.public <T_OTHER,OUT> KeyedPartitionStream.ProcessConfigurableAndKeyedPartitionStream<K,OUT> connectAndProcess(KeyedPartitionStream<K,T_OTHER> other, TwoInputNonBroadcastStreamProcessFunction<V,T_OTHER,OUT> processFunction, KeySelector<OUT,K> newKeySelector)
KeyedPartitionStream
KeyedPartitionStream
.The two keyed
streams must have the same partitions, otherwise it makes no sense to connect them.
This method is used to avoid shuffle after applying the process function. It is required
that for the same record, the new KeySelector
must extract the same key as the
original KeySelector
s on these two KeyedPartitionStream
s. Otherwise, the
partition of data will be messy.
connectAndProcess
in interface KeyedPartitionStream<K,V>
other
- KeyedPartitionStream
to perform operation with two input.processFunction
- to perform operation.newKeySelector
- to select the key after process.KeyedPartitionStream
with this operation.public <T_OTHER,OUT> NonKeyedPartitionStream.ProcessConfigurableAndNonKeyedPartitionStream<OUT> connectAndProcess(BroadcastStream<T_OTHER> other, TwoInputBroadcastStreamProcessFunction<V,T_OTHER,OUT> processFunction)
KeyedPartitionStream
BroadcastStream
.
Generally, concatenating KeyedPartitionStream
and BroadcastStream
will
result in a NonKeyedPartitionStream
, and you can manually generate a KeyedPartitionStream
via keyBy partitioning. In some cases, you can guarantee that the
partition on which the data is processed will not change, then you can use KeyedPartitionStream.connectAndProcess(BroadcastStream, TwoInputBroadcastStreamProcessFunction, KeySelector)
to
avoid shuffling.
connectAndProcess
in interface KeyedPartitionStream<K,V>
processFunction
- to perform operation.public <T_OTHER,OUT> KeyedPartitionStream.ProcessConfigurableAndKeyedPartitionStream<K,OUT> connectAndProcess(BroadcastStream<T_OTHER> other, TwoInputBroadcastStreamProcessFunction<V,T_OTHER,OUT> processFunction, KeySelector<OUT,K> newKeySelector)
KeyedPartitionStream
BroadcastStream
.
This method is used to avoid shuffle after applying the process function. It is required
that for the record from non-broadcast input, the new KeySelector
must extract the
same key as the original KeySelector
s on the KeyedPartitionStream
. Otherwise,
the partition of data will be messy. As for the record from broadcast input, the output key
from keyed partition itself instead of the new key selector, so the data it outputs will not
affect the partition.
connectAndProcess
in interface KeyedPartitionStream<K,V>
other
- BroadcastStream
to perform operation with two input.processFunction
- to perform operation.newKeySelector
- to select the key after process.KeyedPartitionStream
with this operation.public TypeInformation<K> getKeyType()
public KeySelector<V,K> getKeySelector()
public ProcessConfigurable<?> toSink(Sink<V> sink)
toSink
in interface KeyedPartitionStream<K,V>
public GlobalStream<V> global()
KeyedPartitionStream
GlobalStream
.global
in interface KeyedPartitionStream<K,V>
public <NEW_KEY> KeyedPartitionStream<NEW_KEY,V> keyBy(KeySelector<V,NEW_KEY> keySelector)
KeyedPartitionStream
KeyedPartitionStream
.keyBy
in interface KeyedPartitionStream<K,V>
keySelector
- to decide how to map data to partition.public NonKeyedPartitionStream<V> shuffle()
KeyedPartitionStream
NonKeyedPartitionStream
, data will be shuffled between
these two streams.shuffle
in interface KeyedPartitionStream<K,V>
public BroadcastStream<V> broadcast()
KeyedPartitionStream
BroadcastStream
.broadcast
in interface KeyedPartitionStream<K,V>
BroadcastStream
.Copyright © 2014–2024 The Apache Software Foundation. All rights reserved.