T
- The type of the elements in this stream.@Public public class DataStream<T> extends Object
Modifier and Type | Class and Description |
---|---|
static class |
DataStream.Collector<T>
This class acts as an accessor to elements collected via
collectAsync(Collector) . |
Modifier and Type | Field and Description |
---|---|
protected StreamExecutionEnvironment |
environment |
protected Transformation<T> |
transformation |
Constructor and Description |
---|
DataStream(StreamExecutionEnvironment environment,
Transformation<T> transformation)
Create a new
DataStream in the given execution environment with partitioning set to
forward by default. |
Modifier and Type | Method and Description |
---|---|
DataStreamSink<T> |
addSink(SinkFunction<T> sinkFunction)
Adds the given sink to this DataStream.
|
SingleOutputStreamOperator<T> |
assignTimestampsAndWatermarks(AssignerWithPeriodicWatermarks<T> timestampAndWatermarkAssigner)
Deprecated.
Please use
assignTimestampsAndWatermarks(WatermarkStrategy) instead. |
SingleOutputStreamOperator<T> |
assignTimestampsAndWatermarks(AssignerWithPunctuatedWatermarks<T> timestampAndWatermarkAssigner)
Deprecated.
Please use
assignTimestampsAndWatermarks(WatermarkStrategy) instead. |
SingleOutputStreamOperator<T> |
assignTimestampsAndWatermarks(WatermarkStrategy<T> watermarkStrategy)
Assigns timestamps to the elements in the data stream and generates watermarks to signal
event time progress.
|
DataStream<T> |
broadcast()
Sets the partitioning of the
DataStream so that the output elements are broadcasted
to every parallel instance of the next operation. |
BroadcastStream<T> |
broadcast(MapStateDescriptor<?,?>... broadcastStateDescriptors)
Sets the partitioning of the
DataStream so that the output elements are broadcasted
to every parallel instance of the next operation. |
protected <F> F |
clean(F f)
Invokes the
ClosureCleaner on the given function if closure
cleaning is enabled in the ExecutionConfig . |
<T2> CoGroupedStreams<T,T2> |
coGroup(DataStream<T2> otherStream)
Creates a join operation.
|
CloseableIterator<T> |
collectAsync()
Sets up the collection of the elements in this
DataStream , and returns an iterator
over the collected elements that can be used to retrieve elements once the job execution has
started. |
void |
collectAsync(DataStream.Collector<T> collector)
Sets up the collection of the elements in this
DataStream , which can be retrieved
later via the given DataStream.Collector . |
<R> BroadcastConnectedStream<T,R> |
connect(BroadcastStream<R> broadcastStream)
Creates a new
BroadcastConnectedStream by connecting the current DataStream
or KeyedStream with a BroadcastStream . |
<R> ConnectedStreams<T,R> |
connect(DataStream<R> dataStream)
Creates a new
ConnectedStreams by connecting DataStream outputs of (possible)
different types with each other. |
AllWindowedStream<T,GlobalWindow> |
countWindowAll(long size)
Windows this
DataStream into tumbling count windows. |
AllWindowedStream<T,GlobalWindow> |
countWindowAll(long size,
long slide)
Windows this
DataStream into sliding count windows. |
protected <R> SingleOutputStreamOperator<R> |
doTransform(String operatorName,
TypeInformation<R> outTypeInfo,
StreamOperatorFactory<R> operatorFactory) |
CloseableIterator<T> |
executeAndCollect()
Triggers the distributed execution of the streaming dataflow and returns an iterator over the
elements of the given DataStream.
|
List<T> |
executeAndCollect(int limit)
Triggers the distributed execution of the streaming dataflow and returns an iterator over the
elements of the given DataStream.
|
CloseableIterator<T> |
executeAndCollect(String jobExecutionName)
Triggers the distributed execution of the streaming dataflow and returns an iterator over the
elements of the given DataStream.
|
List<T> |
executeAndCollect(String jobExecutionName,
int limit)
Triggers the distributed execution of the streaming dataflow and returns an iterator over the
elements of the given DataStream.
|
SingleOutputStreamOperator<T> |
filter(FilterFunction<T> filter)
Applies a Filter transformation on a
DataStream . |
<R> SingleOutputStreamOperator<R> |
flatMap(FlatMapFunction<T,R> flatMapper)
Applies a FlatMap transformation on a
DataStream . |
<R> SingleOutputStreamOperator<R> |
flatMap(FlatMapFunction<T,R> flatMapper,
TypeInformation<R> outputType)
Applies a FlatMap transformation on a
DataStream . |
DataStream<T> |
forward()
Sets the partitioning of the
DataStream so that the output elements are forwarded to
the local subtask of the next operation. |
PartitionWindowedStream<T> |
fullWindowPartition()
Collect records from each partition into a separate full window.
|
ExecutionConfig |
getExecutionConfig() |
StreamExecutionEnvironment |
getExecutionEnvironment()
Returns the
StreamExecutionEnvironment that was used to create this DataStream . |
int |
getId()
Returns the ID of the
DataStream in the current StreamExecutionEnvironment . |
ResourceSpec |
getMinResources()
Gets the minimum resources for this operator.
|
int |
getParallelism()
Gets the parallelism for this operator.
|
ResourceSpec |
getPreferredResources()
Gets the preferred resources for this operator.
|
Transformation<T> |
getTransformation()
Returns the
Transformation that represents the operation that logically creates this
DataStream . |
TypeInformation<T> |
getType()
Gets the type of the stream.
|
DataStream<T> |
global()
Sets the partitioning of the
DataStream so that the output values all go to the first
instance of the next processing operator. |
IterativeStream<T> |
iterate()
Deprecated.
This method is deprecated since Flink 1.19. The only known use case of this
Iteration API comes from Flink ML, which already has its own implementation of iteration
and no longer uses this API. If there's any use cases other than Flink ML that needs
iteration support, please reach out to dev@flink.apache.org and we can consider making
the Flink ML iteration implementation a separate common library.
|
IterativeStream<T> |
iterate(long maxWaitTimeMillis)
Deprecated.
This method is deprecated since Flink 1.19. The only known use case of this
Iteration API comes from Flink ML, which already has its own implementation of iteration
and no longer uses this API. If there's any use cases other than Flink ML that needs
iteration support, please reach out to dev@flink.apache.org and we can consider making
the Flink ML iteration implementation a separate common library.
|
<T2> JoinedStreams<T,T2> |
join(DataStream<T2> otherStream)
Creates a join operation.
|
KeyedStream<T,Tuple> |
keyBy(int... fields)
Deprecated.
Use
keyBy(KeySelector) . |
<K> KeyedStream<T,K> |
keyBy(KeySelector<T,K> key)
It creates a new
KeyedStream that uses the provided key for partitioning its operator
states. |
<K> KeyedStream<T,K> |
keyBy(KeySelector<T,K> key,
TypeInformation<K> keyType)
It creates a new
KeyedStream that uses the provided key with explicit type
information for partitioning its operator states. |
KeyedStream<T,Tuple> |
keyBy(String... fields)
Deprecated.
Use
keyBy(KeySelector) . |
<R> SingleOutputStreamOperator<R> |
map(MapFunction<T,R> mapper)
Applies a Map transformation on a
DataStream . |
<R> SingleOutputStreamOperator<R> |
map(MapFunction<T,R> mapper,
TypeInformation<R> outputType)
Applies a Map transformation on a
DataStream . |
<K> DataStream<T> |
partitionCustom(Partitioner<K> partitioner,
int field)
Deprecated.
|
<K> DataStream<T> |
partitionCustom(Partitioner<K> partitioner,
KeySelector<T,K> keySelector)
Partitions a DataStream on the key returned by the selector, using a custom partitioner.
|
<K> DataStream<T> |
partitionCustom(Partitioner<K> partitioner,
String field)
Deprecated.
|
DataStreamSink<T> |
print()
Writes a DataStream to the standard output stream (stdout).
|
DataStreamSink<T> |
print(String sinkIdentifier)
Writes a DataStream to the standard output stream (stdout).
|
DataStreamSink<T> |
printToErr()
Writes a DataStream to the standard error stream (stderr).
|
DataStreamSink<T> |
printToErr(String sinkIdentifier)
Writes a DataStream to the standard error stream (stderr).
|
<R> SingleOutputStreamOperator<R> |
process(ProcessFunction<T,R> processFunction)
Applies the given
ProcessFunction on the input stream, thereby creating a transformed
output stream. |
<R> SingleOutputStreamOperator<R> |
process(ProcessFunction<T,R> processFunction,
TypeInformation<R> outputType)
Applies the given
ProcessFunction on the input stream, thereby creating a transformed
output stream. |
<R extends Tuple> |
project(int... fieldIndexes)
Initiates a Project transformation on a
Tuple DataStream . |
DataStream<T> |
rebalance()
Sets the partitioning of the
DataStream so that the output elements are distributed
evenly to instances of the next operation in a round-robin fashion. |
DataStream<T> |
rescale()
Sets the partitioning of the
DataStream so that the output elements are distributed
evenly to a subset of instances of the next operation in a round-robin fashion. |
protected DataStream<T> |
setConnectionType(StreamPartitioner<T> partitioner)
Internal function for setting the partitioner for the DataStream.
|
DataStream<T> |
shuffle()
Sets the partitioning of the
DataStream so that the output elements are shuffled
uniformly randomly to the next operation. |
DataStreamSink<T> |
sinkTo(Sink<T,?,?,?> sink)
Adds the given
Sink to this DataStream. |
DataStreamSink<T> |
sinkTo(Sink<T,?,?,?> sink,
CustomSinkOperatorUidHashes customSinkOperatorUidHashes)
Adds the given
Sink to this DataStream. |
DataStreamSink<T> |
sinkTo(Sink<T> sink)
Adds the given
Sink to this DataStream. |
DataStreamSink<T> |
sinkTo(Sink<T> sink,
CustomSinkOperatorUidHashes customSinkOperatorUidHashes)
Adds the given
Sink to this DataStream. |
AllWindowedStream<T,TimeWindow> |
timeWindowAll(Time size)
Deprecated.
Please use
windowAll(WindowAssigner) with either TumblingEventTimeWindows or TumblingProcessingTimeWindows . For more information,
see the deprecation notice on TimeCharacteristic |
AllWindowedStream<T,TimeWindow> |
timeWindowAll(Time size,
Time slide)
Deprecated.
Please use
windowAll(WindowAssigner) with either SlidingEventTimeWindows or SlidingProcessingTimeWindows . For more information,
see the deprecation notice on TimeCharacteristic |
<R> SingleOutputStreamOperator<R> |
transform(String operatorName,
TypeInformation<R> outTypeInfo,
OneInputStreamOperator<T,R> operator)
Method for passing user defined operators along with the type information that will transform
the DataStream.
|
<R> SingleOutputStreamOperator<R> |
transform(String operatorName,
TypeInformation<R> outTypeInfo,
OneInputStreamOperatorFactory<T,R> operatorFactory)
Method for passing user defined operators created by the given factory along with the type
information that will transform the DataStream.
|
DataStream<T> |
union(DataStream<T>... streams)
Creates a new
DataStream by merging DataStream outputs of the same type with
each other. |
<W extends Window> |
windowAll(WindowAssigner<? super T,W> assigner)
Windows this data stream to a
AllWindowedStream , which evaluates windows over a non
key grouped stream. |
DataStreamSink<T> |
writeAsCsv(String path)
Deprecated.
Please use the
StreamingFileSink explicitly
using the addSink(SinkFunction) method. |
DataStreamSink<T> |
writeAsCsv(String path,
FileSystem.WriteMode writeMode)
Deprecated.
Please use the
StreamingFileSink explicitly
using the addSink(SinkFunction) method. |
<X extends Tuple> |
writeAsCsv(String path,
FileSystem.WriteMode writeMode,
String rowDelimiter,
String fieldDelimiter)
Deprecated.
Please use the
StreamingFileSink explicitly
using the addSink(SinkFunction) method. |
DataStreamSink<T> |
writeAsText(String path)
Deprecated.
Please use the
StreamingFileSink explicitly
using the addSink(SinkFunction) method. |
DataStreamSink<T> |
writeAsText(String path,
FileSystem.WriteMode writeMode)
Deprecated.
Please use the
StreamingFileSink explicitly
using the addSink(SinkFunction) method. |
DataStreamSink<T> |
writeToSocket(String hostName,
int port,
SerializationSchema<T> schema)
Writes the DataStream to a socket as a byte array.
|
DataStreamSink<T> |
writeUsingOutputFormat(OutputFormat<T> format)
Deprecated.
Please use the
StreamingFileSink explicitly
using the addSink(SinkFunction) method. |
protected final StreamExecutionEnvironment environment
protected final Transformation<T> transformation
public DataStream(StreamExecutionEnvironment environment, Transformation<T> transformation)
DataStream
in the given execution environment with partitioning set to
forward by default.environment
- The StreamExecutionEnvironment@Internal public int getId()
DataStream
in the current StreamExecutionEnvironment
.public int getParallelism()
@PublicEvolving public ResourceSpec getMinResources()
@PublicEvolving public ResourceSpec getPreferredResources()
public TypeInformation<T> getType()
protected <F> F clean(F f)
ClosureCleaner
on the given function if closure
cleaning is enabled in the ExecutionConfig
.public StreamExecutionEnvironment getExecutionEnvironment()
StreamExecutionEnvironment
that was used to create this DataStream
.public ExecutionConfig getExecutionConfig()
@SafeVarargs public final DataStream<T> union(DataStream<T>... streams)
DataStream
by merging DataStream
outputs of the same type with
each other. The DataStreams merged using this operator will be transformed simultaneously.streams
- The DataStreams to union output with.DataStream
.public <R> ConnectedStreams<T,R> connect(DataStream<R> dataStream)
ConnectedStreams
by connecting DataStream
outputs of (possible)
different types with each other. The DataStreams connected using this operator can be used
with CoFunctions to apply joint transformations.dataStream
- The DataStream with which this stream will be connected.ConnectedStreams
.@PublicEvolving public <R> BroadcastConnectedStream<T,R> connect(BroadcastStream<R> broadcastStream)
BroadcastConnectedStream
by connecting the current DataStream
or KeyedStream
with a BroadcastStream
.
The latter can be created using the broadcast(MapStateDescriptor[])
method.
The resulting stream can be further processed using the BroadcastConnectedStream.process(MyFunction)
method, where MyFunction
can be either
a KeyedBroadcastProcessFunction
or a BroadcastProcessFunction
depending on the current stream being a KeyedStream
or not.
broadcastStream
- The broadcast stream with the broadcast state to be connected with
this stream.BroadcastConnectedStream
.public <K> KeyedStream<T,K> keyBy(KeySelector<T,K> key)
KeyedStream
that uses the provided key for partitioning its operator
states.key
- The KeySelector to be used for extracting the key for partitioningDataStream
with partitioned state (i.e. KeyedStream)public <K> KeyedStream<T,K> keyBy(KeySelector<T,K> key, TypeInformation<K> keyType)
KeyedStream
that uses the provided key with explicit type
information for partitioning its operator states.key
- The KeySelector to be used for extracting the key for partitioning.keyType
- The type information describing the key type.DataStream
with partitioned state (i.e. KeyedStream)@Deprecated public KeyedStream<T,Tuple> keyBy(int... fields)
keyBy(KeySelector)
.DataStream
by the given key positions.fields
- The position of the fields on which the DataStream
will be grouped.DataStream
with partitioned state (i.e. KeyedStream)@Deprecated public KeyedStream<T,Tuple> keyBy(String... fields)
keyBy(KeySelector)
.DataStream
using field expressions. A field
expression is either the name of a public field or a getter method with parentheses of the
DataStream
's underlying type. A dot can be used to drill down into objects, as in
"field1.getInnerField2()"
.fields
- One or more field expressions on which the state of the DataStream
operators will be partitioned.DataStream
with partitioned state (i.e. KeyedStream)@Deprecated public <K> DataStream<T> partitionCustom(Partitioner<K> partitioner, int field)
partitionCustom(Partitioner, KeySelector)
.Note: This method works only on single field keys.
partitioner
- The partitioner to assign partitions to keys.field
- The field index on which the DataStream is partitioned.@Deprecated public <K> DataStream<T> partitionCustom(Partitioner<K> partitioner, String field)
partitionCustom(Partitioner, KeySelector)
.Note: This method works only on single field keys.
partitioner
- The partitioner to assign partitions to keys.field
- The expression for the field on which the DataStream is partitioned.public <K> DataStream<T> partitionCustom(Partitioner<K> partitioner, KeySelector<T,K> keySelector)
Note: This method works only on single field keys, i.e. the selector cannot return tuples of fields.
partitioner
- The partitioner to assign partitions to keys.keySelector
- The KeySelector with which the DataStream is partitioned.KeySelector
public DataStream<T> broadcast()
DataStream
so that the output elements are broadcasted
to every parallel instance of the next operation.@PublicEvolving public BroadcastStream<T> broadcast(MapStateDescriptor<?,?>... broadcastStateDescriptors)
DataStream
so that the output elements are broadcasted
to every parallel instance of the next operation. In addition, it implicitly as many broadcast states
as the specified
descriptors which can be used to store the element of the stream.broadcastStateDescriptors
- the descriptors of the broadcast states to create.BroadcastStream
which can be used in the connect(BroadcastStream)
to create a BroadcastConnectedStream
for further processing of the elements.@PublicEvolving public DataStream<T> shuffle()
DataStream
so that the output elements are shuffled
uniformly randomly to the next operation.public DataStream<T> forward()
DataStream
so that the output elements are forwarded to
the local subtask of the next operation.public DataStream<T> rebalance()
DataStream
so that the output elements are distributed
evenly to instances of the next operation in a round-robin fashion.@PublicEvolving public DataStream<T> rescale()
DataStream
so that the output elements are distributed
evenly to a subset of instances of the next operation in a round-robin fashion.
The subset of downstream operations to which the upstream operation sends elements depends on the degree of parallelism of both the upstream and downstream operation. For example, if the upstream operation has parallelism 2 and the downstream operation has parallelism 4, then one upstream operation would distribute elements to two downstream operations while the other upstream operation would distribute to the other two downstream operations. If, on the other hand, the downstream operation has parallelism 2 while the upstream operation has parallelism 4 then two upstream operations will distribute to one downstream operation while the other two upstream operations will distribute to the other downstream operations.
In cases where the different parallelisms are not multiples of each other one or several downstream operations will have a differing number of inputs from upstream operations.
@PublicEvolving public DataStream<T> global()
DataStream
so that the output values all go to the first
instance of the next processing operator. Use this setting with care since it might cause a
serious performance bottleneck in the application.@Deprecated public IterativeStream<T> iterate()
IterativeStream.closeWith(DataStream)
. The
transformation of this IterativeStream will be the iteration head. The data stream given to
the IterativeStream.closeWith(DataStream)
method is the data stream that will be fed
back and used as the input for the iteration head. The user can also use different feedback
type than the input of the iteration and treat the input and feedback streams as a ConnectedStreams
be calling IterativeStream.withFeedbackType(TypeInformation)
A common usage pattern for streaming iterations is to use output splitting to send a part
of the closing data stream to the head. Refer to ProcessFunction.Context#output(OutputTag, Object)
for more information.
The iteration edge will be partitioned the same way as the first input of the iteration
head unless it is changed in the IterativeStream.closeWith(DataStream)
call.
By default a DataStream with iteration will never terminate, but the user can use the maxWaitTime parameter to set a max waiting time for the iteration head. If no data received in the set time, the stream terminates.
@Deprecated public IterativeStream<T> iterate(long maxWaitTimeMillis)
IterativeStream.closeWith(DataStream)
. The
transformation of this IterativeStream will be the iteration head. The data stream given to
the IterativeStream.closeWith(DataStream)
method is the data stream that will be fed
back and used as the input for the iteration head. The user can also use different feedback
type than the input of the iteration and treat the input and feedback streams as a ConnectedStreams
be calling IterativeStream.withFeedbackType(TypeInformation)
A common usage pattern for streaming iterations is to use output splitting to send a part
of the closing data stream to the head. Refer to ProcessFunction.Context#output(OutputTag, Object)
for more information.
The iteration edge will be partitioned the same way as the first input of the iteration
head unless it is changed in the IterativeStream.closeWith(DataStream)
call.
By default a DataStream with iteration will never terminate, but the user can use the maxWaitTime parameter to set a max waiting time for the iteration head. If no data received in the set time, the stream terminates.
maxWaitTimeMillis
- Number of milliseconds to wait between inputs before shutting downpublic <R> SingleOutputStreamOperator<R> map(MapFunction<T,R> mapper)
DataStream
. The transformation calls a MapFunction
for each element of the DataStream. Each MapFunction call returns exactly one
element. The user can also extend RichMapFunction
to gain access to other features
provided by the RichFunction
interface.R
- output typemapper
- The MapFunction that is called for each element of the DataStream.DataStream
.public <R> SingleOutputStreamOperator<R> map(MapFunction<T,R> mapper, TypeInformation<R> outputType)
DataStream
. The transformation calls a MapFunction
for each element of the DataStream. Each MapFunction call returns exactly one
element. The user can also extend RichMapFunction
to gain access to other features
provided by the RichFunction
interface.R
- output typemapper
- The MapFunction that is called for each element of the DataStream.outputType
- TypeInformation
for the result type of the function.DataStream
.public <R> SingleOutputStreamOperator<R> flatMap(FlatMapFunction<T,R> flatMapper)
DataStream
. The transformation calls a FlatMapFunction
for each element of the DataStream. Each FlatMapFunction call can return any
number of elements including none. The user can also extend RichFlatMapFunction
to
gain access to other features provided by the RichFunction
interface.R
- output typeflatMapper
- The FlatMapFunction that is called for each element of the DataStreamDataStream
.public <R> SingleOutputStreamOperator<R> flatMap(FlatMapFunction<T,R> flatMapper, TypeInformation<R> outputType)
DataStream
. The transformation calls a FlatMapFunction
for each element of the DataStream. Each FlatMapFunction call can return any
number of elements including none. The user can also extend RichFlatMapFunction
to
gain access to other features provided by the RichFunction
interface.R
- output typeflatMapper
- The FlatMapFunction that is called for each element of the DataStreamoutputType
- TypeInformation
for the result type of the function.DataStream
.@PublicEvolving public <R> SingleOutputStreamOperator<R> process(ProcessFunction<T,R> processFunction)
ProcessFunction
on the input stream, thereby creating a transformed
output stream.
The function will be called for every element in the input streams and can produce zero or more output elements.
R
- The type of elements emitted by the ProcessFunction
.processFunction
- The ProcessFunction
that is called for each element in the
stream.DataStream
.@Internal public <R> SingleOutputStreamOperator<R> process(ProcessFunction<T,R> processFunction, TypeInformation<R> outputType)
ProcessFunction
on the input stream, thereby creating a transformed
output stream.
The function will be called for every element in the input streams and can produce zero or more output elements.
R
- The type of elements emitted by the ProcessFunction
.processFunction
- The ProcessFunction
that is called for each element in the
stream.outputType
- TypeInformation
for the result type of the function.DataStream
.public SingleOutputStreamOperator<T> filter(FilterFunction<T> filter)
DataStream
. The transformation calls a FilterFunction
for each element of the DataStream and retains only those element for which
the function returns true. Elements for which the function returns false are filtered. The
user can also extend RichFilterFunction
to gain access to other features provided by
the RichFunction
interface.filter
- The FilterFunction that is called for each element of the DataStream.@PublicEvolving public <R extends Tuple> SingleOutputStreamOperator<R> project(int... fieldIndexes)
Tuple
DataStream
.The transformation projects each Tuple of the DataSet onto a (sub)set of fields.
fieldIndexes
- The field indexes of the input tuples that are retained. The order of
fields in the output tuple corresponds to the order of field indexes.Tuple
,
DataStream
public <T2> CoGroupedStreams<T,T2> coGroup(DataStream<T2> otherStream)
CoGroupedStreams
for an example of how the keys and
window can be specified.public <T2> JoinedStreams<T,T2> join(DataStream<T2> otherStream)
JoinedStreams
for an example of how the keys and window
can be specified.@Deprecated public AllWindowedStream<T,TimeWindow> timeWindowAll(Time size)
windowAll(WindowAssigner)
with either TumblingEventTimeWindows
or TumblingProcessingTimeWindows
. For more information,
see the deprecation notice on TimeCharacteristic
DataStream
into tumbling time windows.
This is a shortcut for either .window(TumblingEventTimeWindows.of(size))
or .window(TumblingProcessingTimeWindows.of(size))
depending on the time characteristic set
using
Note: This operation is inherently non-parallel since all elements have to pass through the same operator instance.
size
- The size of the window.@Deprecated public AllWindowedStream<T,TimeWindow> timeWindowAll(Time size, Time slide)
windowAll(WindowAssigner)
with either SlidingEventTimeWindows
or SlidingProcessingTimeWindows
. For more information,
see the deprecation notice on TimeCharacteristic
DataStream
into sliding time windows.
This is a shortcut for either .window(SlidingEventTimeWindows.of(size, slide))
or
.window(SlidingProcessingTimeWindows.of(size, slide))
depending on the time
characteristic set using StreamExecutionEnvironment.setStreamTimeCharacteristic(org.apache.flink.streaming.api.TimeCharacteristic)
Note: This operation is inherently non-parallel since all elements have to pass through the same operator instance.
size
- The size of the window.public AllWindowedStream<T,GlobalWindow> countWindowAll(long size)
DataStream
into tumbling count windows.
Note: This operation is inherently non-parallel since all elements have to pass through the same operator instance.
size
- The size of the windows in number of elements.public AllWindowedStream<T,GlobalWindow> countWindowAll(long size, long slide)
DataStream
into sliding count windows.
Note: This operation is inherently non-parallel since all elements have to pass through the same operator instance.
size
- The size of the windows in number of elements.slide
- The slide interval in number of elements.@PublicEvolving public <W extends Window> AllWindowedStream<T,W> windowAll(WindowAssigner<? super T,W> assigner)
AllWindowedStream
, which evaluates windows over a non
key grouped stream. Elements are put into windows by a WindowAssigner
. The grouping of elements
is done by window.
A Trigger
can be defined to
specify when windows are evaluated. However, WindowAssigners
have a default Trigger
that is used if a Trigger
is not specified.
Note: This operation is inherently non-parallel since all elements have to pass through the same operator instance.
assigner
- The WindowAssigner
that assigns elements to windows.public SingleOutputStreamOperator<T> assignTimestampsAndWatermarks(WatermarkStrategy<T> watermarkStrategy)
WatermarkStrategy
is used to create a TimestampAssigner
and WatermarkGenerator
.
For each element in the data stream, the TimestampAssigner.extractTimestamp(Object,
long)
method is called to assign an event timestamp.
For each event in the data stream, the WatermarkGenerator.onEvent(Object, long,
WatermarkOutput)
will be called.
Periodically (defined by the ExecutionConfig.getAutoWatermarkInterval()
), the
WatermarkGenerator.onPeriodicEmit(WatermarkOutput)
method will be called.
Common watermark generation patterns can be found as static methods in the WatermarkStrategy
class.
watermarkStrategy
- The strategy to generate watermarks based on event timestamps.@Deprecated public SingleOutputStreamOperator<T> assignTimestampsAndWatermarks(AssignerWithPeriodicWatermarks<T> timestampAndWatermarkAssigner)
assignTimestampsAndWatermarks(WatermarkStrategy)
instead.This method uses the deprecated watermark generator interfaces. Please switch to assignTimestampsAndWatermarks(WatermarkStrategy)
to use the new interfaces instead. The new
interfaces support watermark idleness and no longer need to differentiate between "periodic"
and "punctuated" watermarks.
@Deprecated public SingleOutputStreamOperator<T> assignTimestampsAndWatermarks(AssignerWithPunctuatedWatermarks<T> timestampAndWatermarkAssigner)
assignTimestampsAndWatermarks(WatermarkStrategy)
instead.This method uses the deprecated watermark generator interfaces. Please switch to assignTimestampsAndWatermarks(WatermarkStrategy)
to use the new interfaces instead. The new
interfaces support watermark idleness and no longer need to differentiate between "periodic"
and "punctuated" watermarks.
@PublicEvolving public DataStreamSink<T> print()
For each element of the DataStream the result of Object.toString()
is written.
NOTE: This will print to stdout on the machine where the code is executed, i.e. the Flink worker.
@PublicEvolving public DataStreamSink<T> printToErr()
For each element of the DataStream the result of Object.toString()
is written.
NOTE: This will print to stderr on the machine where the code is executed, i.e. the Flink worker.
@PublicEvolving public DataStreamSink<T> print(String sinkIdentifier)
For each element of the DataStream the result of Object.toString()
is written.
NOTE: This will print to stdout on the machine where the code is executed, i.e. the Flink worker.
sinkIdentifier
- The string to prefix the output with.@PublicEvolving public DataStreamSink<T> printToErr(String sinkIdentifier)
For each element of the DataStream the result of Object.toString()
is written.
NOTE: This will print to stderr on the machine where the code is executed, i.e. the Flink worker.
sinkIdentifier
- The string to prefix the output with.@Deprecated @PublicEvolving public DataStreamSink<T> writeAsText(String path)
StreamingFileSink
explicitly
using the addSink(SinkFunction)
method.For every element of the DataStream the result of Object.toString()
is written.
path
- The path pointing to the location the text file is written to.@Deprecated @PublicEvolving public DataStreamSink<T> writeAsText(String path, FileSystem.WriteMode writeMode)
StreamingFileSink
explicitly
using the addSink(SinkFunction)
method.For every element of the DataStream the result of Object.toString()
is written.
path
- The path pointing to the location the text file is written towriteMode
- Controls the behavior for existing files. Options are NO_OVERWRITE and
OVERWRITE.@Deprecated @PublicEvolving public DataStreamSink<T> writeAsCsv(String path)
StreamingFileSink
explicitly
using the addSink(SinkFunction)
method.For every field of an element of the DataStream the result of Object.toString()
is
written. This method can only be used on data streams of tuples.
path
- the path pointing to the location the text file is written to@Deprecated @PublicEvolving public DataStreamSink<T> writeAsCsv(String path, FileSystem.WriteMode writeMode)
StreamingFileSink
explicitly
using the addSink(SinkFunction)
method.For every field of an element of the DataStream the result of Object.toString()
is
written. This method can only be used on data streams of tuples.
path
- the path pointing to the location the text file is written towriteMode
- Controls the behavior for existing files. Options are NO_OVERWRITE and
OVERWRITE.@Deprecated @PublicEvolving public <X extends Tuple> DataStreamSink<T> writeAsCsv(String path, FileSystem.WriteMode writeMode, String rowDelimiter, String fieldDelimiter)
StreamingFileSink
explicitly
using the addSink(SinkFunction)
method.For every field of an element of the DataStream the result of Object.toString()
is
written. This method can only be used on data streams of tuples.
path
- the path pointing to the location the text file is written towriteMode
- Controls the behavior for existing files. Options are NO_OVERWRITE and
OVERWRITE.rowDelimiter
- the delimiter for two rowsfieldDelimiter
- the delimiter for two fields@PublicEvolving public DataStreamSink<T> writeToSocket(String hostName, int port, SerializationSchema<T> schema)
SerializationSchema
.hostName
- host of the socketport
- port of the socketschema
- schema for serialization@Deprecated @PublicEvolving public DataStreamSink<T> writeUsingOutputFormat(OutputFormat<T> format)
StreamingFileSink
explicitly
using the addSink(SinkFunction)
method.The output is not participating in Flink's checkpointing!
For writing to a file system periodically, the use of the StreamingFileSink
is recommended.
format
- The output format@PublicEvolving public <R> SingleOutputStreamOperator<R> transform(String operatorName, TypeInformation<R> outTypeInfo, OneInputStreamOperator<T,R> operator)
R
- type of the return streamoperatorName
- name of the operator, for logging purposesoutTypeInfo
- the output type of the operatoroperator
- the object containing the transformation logictransform(String, TypeInformation, OneInputStreamOperatorFactory)
@PublicEvolving public <R> SingleOutputStreamOperator<R> transform(String operatorName, TypeInformation<R> outTypeInfo, OneInputStreamOperatorFactory<T,R> operatorFactory)
This method uses the rather new operator factories and should only be used when custom factories are needed.
R
- type of the return streamoperatorName
- name of the operator, for logging purposesoutTypeInfo
- the output type of the operatoroperatorFactory
- the factory for the operator.protected <R> SingleOutputStreamOperator<R> doTransform(String operatorName, TypeInformation<R> outTypeInfo, StreamOperatorFactory<R> operatorFactory)
protected DataStream<T> setConnectionType(StreamPartitioner<T> partitioner)
partitioner
- Partitioner to set.public DataStreamSink<T> addSink(SinkFunction<T> sinkFunction)
StreamExecutionEnvironment.execute()
method is called.sinkFunction
- The object containing the sink's invoke function.@PublicEvolving public DataStreamSink<T> sinkTo(Sink<T,?,?,?> sink)
Sink
to this DataStream. Only streams with sinks added will be
executed once the StreamExecutionEnvironment.execute()
method is called.sink
- The user defined sink.@PublicEvolving public DataStreamSink<T> sinkTo(Sink<T,?,?,?> sink, CustomSinkOperatorUidHashes customSinkOperatorUidHashes)
Sink
to this DataStream. Only streams with sinks added will be
executed once the StreamExecutionEnvironment.execute()
method is called.
This method is intended to be used only to recover a snapshot where no uids have been set before taking the snapshot.
sink
- The user defined sink.@PublicEvolving public DataStreamSink<T> sinkTo(Sink<T> sink)
Sink
to this DataStream. Only streams with sinks added will be
executed once the StreamExecutionEnvironment.execute()
method is called.sink
- The user defined sink.@PublicEvolving public DataStreamSink<T> sinkTo(Sink<T> sink, CustomSinkOperatorUidHashes customSinkOperatorUidHashes)
Sink
to this DataStream. Only streams with sinks added will be
executed once the StreamExecutionEnvironment.execute()
method is called.
This method is intended to be used only to recover a snapshot where no uids have been set before taking the snapshot.
customSinkOperatorUidHashes
- operator hashes to support state bindingsink
- The user defined sink.public CloseableIterator<T> executeAndCollect() throws Exception
The DataStream application is executed in the regular distributed manner on the target environment, and the events from the stream are polled back to this application process and thread through Flink's REST API.
IMPORTANT The returned iterator must be closed to free all cluster resources.
Exception
public CloseableIterator<T> executeAndCollect(String jobExecutionName) throws Exception
The DataStream application is executed in the regular distributed manner on the target environment, and the events from the stream are polled back to this application process and thread through Flink's REST API.
IMPORTANT The returned iterator must be closed to free all cluster resources.
Exception
public List<T> executeAndCollect(int limit) throws Exception
The DataStream application is executed in the regular distributed manner on the target environment, and the events from the stream are polled back to this application process and thread through Flink's REST API.
Exception
public List<T> executeAndCollect(String jobExecutionName, int limit) throws Exception
The DataStream application is executed in the regular distributed manner on the target environment, and the events from the stream are polled back to this application process and thread through Flink's REST API.
Exception
@Experimental public CloseableIterator<T> collectAsync()
DataStream
, and returns an iterator
over the collected elements that can be used to retrieve elements once the job execution has
started.
Caution: When multiple streams are being collected it is recommended to consume all streams in parallel to not back-pressure the job.
Caution: Closing the returned iterator cancels the job! It is recommended to close all iterators once you are no longer interested in any of the collected streams.
This method is functionally equivalent to collectAsync(Collector)
.
@Experimental public void collectAsync(DataStream.Collector<T> collector)
DataStream
, which can be retrieved
later via the given DataStream.Collector
.
Caution: When multiple streams are being collected it is recommended to consume all streams in parallel to not back-pressure the job.
Caution: Closing the iterator from the collector cancels the job! It is recommended to close all iterators once you are no longer interested in any of the collected streams.
This method is functionally equivalent to collectAsync()
.
This method is meant to support use-cases where the application of a sink is done via a
Consumer<DataStream<T>>
, where it wouldn't be possible (or inconvenient) to return an
iterator.
collector
- a collector that can be used to retrieve the elements@PublicEvolving public PartitionWindowedStream<T> fullWindowPartition()
@Internal public Transformation<T> getTransformation()
Transformation
that represents the operation that logically creates this
DataStream
.Copyright © 2014–2024 The Apache Software Foundation. All rights reserved.