Class DataStream<T>
- java.lang.Object
-
- org.apache.flink.streaming.api.datastream.DataStream<T>
-
- Type Parameters:
T
- The type of the elements in this stream.
- Direct Known Subclasses:
CachedDataStream
,KeyedStream
,SideOutputDataStream
,SingleOutputStreamOperator
@Public public class DataStream<T> extends Object
A DataStream represents a stream of elements of the same type. A DataStream can be transformed into another DataStream by applying a transformation as for example:
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
DataStream.Collector<T>
This class acts as an accessor to elements collected viacollectAsync(Collector)
.
-
Field Summary
Fields Modifier and Type Field Description protected StreamExecutionEnvironment
environment
protected Transformation<T>
transformation
-
Constructor Summary
Constructors Constructor Description DataStream(StreamExecutionEnvironment environment, Transformation<T> transformation)
Create a newDataStream
in the given execution environment with partitioning set to forward by default.
-
Method Summary
All Methods Instance Methods Concrete Methods Deprecated Methods Modifier and Type Method Description DataStreamSink<T>
addSink(SinkFunction<T> sinkFunction)
Adds the given sink to this DataStream.SingleOutputStreamOperator<T>
assignTimestampsAndWatermarks(WatermarkStrategy<T> watermarkStrategy)
Assigns timestamps to the elements in the data stream and generates watermarks to signal event time progress.DataStream<T>
broadcast()
Sets the partitioning of theDataStream
so that the output elements are broadcasted to every parallel instance of the next operation.BroadcastStream<T>
broadcast(MapStateDescriptor<?,?>... broadcastStateDescriptors)
Sets the partitioning of theDataStream
so that the output elements are broadcasted to every parallel instance of the next operation.protected <F> F
clean(F f)
Invokes theClosureCleaner
on the given function if closure cleaning is enabled in theExecutionConfig
.<T2> CoGroupedStreams<T,T2>
coGroup(DataStream<T2> otherStream)
Creates a join operation.CloseableIterator<T>
collectAsync()
Sets up the collection of the elements in thisDataStream
, and returns an iterator over the collected elements that can be used to retrieve elements once the job execution has started.void
collectAsync(DataStream.Collector<T> collector)
Sets up the collection of the elements in thisDataStream
, which can be retrieved later via the givenDataStream.Collector
.<R> BroadcastConnectedStream<T,R>
connect(BroadcastStream<R> broadcastStream)
Creates a newBroadcastConnectedStream
by connecting the currentDataStream
orKeyedStream
with aBroadcastStream
.<R> ConnectedStreams<T,R>
connect(DataStream<R> dataStream)
Creates a newConnectedStreams
by connectingDataStream
outputs of (possible) different types with each other.AllWindowedStream<T,GlobalWindow>
countWindowAll(long size)
Windows thisDataStream
into tumbling count windows.AllWindowedStream<T,GlobalWindow>
countWindowAll(long size, long slide)
Windows thisDataStream
into sliding count windows.protected <R> SingleOutputStreamOperator<R>
doTransform(String operatorName, TypeInformation<R> outTypeInfo, StreamOperatorFactory<R> operatorFactory)
CloseableIterator<T>
executeAndCollect()
Triggers the distributed execution of the streaming dataflow and returns an iterator over the elements of the given DataStream.List<T>
executeAndCollect(int limit)
Triggers the distributed execution of the streaming dataflow and returns an iterator over the elements of the given DataStream.CloseableIterator<T>
executeAndCollect(String jobExecutionName)
Triggers the distributed execution of the streaming dataflow and returns an iterator over the elements of the given DataStream.List<T>
executeAndCollect(String jobExecutionName, int limit)
Triggers the distributed execution of the streaming dataflow and returns an iterator over the elements of the given DataStream.SingleOutputStreamOperator<T>
filter(FilterFunction<T> filter)
Applies a Filter transformation on aDataStream
.<R> SingleOutputStreamOperator<R>
flatMap(FlatMapFunction<T,R> flatMapper)
Applies a FlatMap transformation on aDataStream
.<R> SingleOutputStreamOperator<R>
flatMap(FlatMapFunction<T,R> flatMapper, TypeInformation<R> outputType)
Applies a FlatMap transformation on aDataStream
.DataStream<T>
forward()
Sets the partitioning of theDataStream
so that the output elements are forwarded to the local subtask of the next operation.PartitionWindowedStream<T>
fullWindowPartition()
Collect records from each partition into a separate full window.ExecutionConfig
getExecutionConfig()
StreamExecutionEnvironment
getExecutionEnvironment()
Returns theStreamExecutionEnvironment
that was used to create thisDataStream
.int
getId()
Returns the ID of theDataStream
in the currentStreamExecutionEnvironment
.ResourceSpec
getMinResources()
Gets the minimum resources for this operator.int
getParallelism()
Gets the parallelism for this operator.ResourceSpec
getPreferredResources()
Gets the preferred resources for this operator.Transformation<T>
getTransformation()
Returns theTransformation
that represents the operation that logically creates thisDataStream
.TypeInformation<T>
getType()
Gets the type of the stream.DataStream<T>
global()
Sets the partitioning of theDataStream
so that the output values all go to the first instance of the next processing operator.<T2> JoinedStreams<T,T2>
join(DataStream<T2> otherStream)
Creates a join operation.protected KeyedStream<T,Tuple>
keyBy(Keys<T> keys)
<K> KeyedStream<T,K>
keyBy(KeySelector<T,K> key)
It creates a newKeyedStream
that uses the provided key for partitioning its operator states.<K> KeyedStream<T,K>
keyBy(KeySelector<T,K> key, TypeInformation<K> keyType)
It creates a newKeyedStream
that uses the provided key with explicit type information for partitioning its operator states.<R> SingleOutputStreamOperator<R>
map(MapFunction<T,R> mapper)
Applies a Map transformation on aDataStream
.<R> SingleOutputStreamOperator<R>
map(MapFunction<T,R> mapper, TypeInformation<R> outputType)
Applies a Map transformation on aDataStream
.<K> DataStream<T>
partitionCustom(Partitioner<K> partitioner, KeySelector<T,K> keySelector)
Partitions a DataStream on the key returned by the selector, using a custom partitioner.DataStreamSink<T>
print()
Writes a DataStream to the standard output stream (stdout).DataStreamSink<T>
print(String sinkIdentifier)
Writes a DataStream to the standard output stream (stdout).DataStreamSink<T>
printToErr()
Writes a DataStream to the standard error stream (stderr).DataStreamSink<T>
printToErr(String sinkIdentifier)
Writes a DataStream to the standard error stream (stderr).<R> SingleOutputStreamOperator<R>
process(ProcessFunction<T,R> processFunction)
Applies the givenProcessFunction
on the input stream, thereby creating a transformed output stream.<R> SingleOutputStreamOperator<R>
process(ProcessFunction<T,R> processFunction, TypeInformation<R> outputType)
Applies the givenProcessFunction
on the input stream, thereby creating a transformed output stream.<R extends Tuple>
SingleOutputStreamOperator<R>project(int... fieldIndexes)
Initiates a Project transformation on aTuple
DataStream
.
Note: Only Tuple DataStreams can be projected.DataStream<T>
rebalance()
Sets the partitioning of theDataStream
so that the output elements are distributed evenly to instances of the next operation in a round-robin fashion.DataStream<T>
rescale()
Sets the partitioning of theDataStream
so that the output elements are distributed evenly to a subset of instances of the next operation in a round-robin fashion.protected DataStream<T>
setConnectionType(StreamPartitioner<T> partitioner)
Internal function for setting the partitioner for the DataStream.DataStream<T>
shuffle()
Sets the partitioning of theDataStream
so that the output elements are shuffled uniformly randomly to the next operation.DataStreamSink<T>
sinkTo(Sink<T> sink)
Adds the givenSink
to this DataStream.DataStreamSink<T>
sinkTo(Sink<T> sink, CustomSinkOperatorUidHashes customSinkOperatorUidHashes)
Adds the givenSink
to this DataStream.<R> SingleOutputStreamOperator<R>
transform(String operatorName, TypeInformation<R> outTypeInfo, OneInputStreamOperator<T,R> operator)
Method for passing user defined operators along with the type information that will transform the DataStream.<R> SingleOutputStreamOperator<R>
transform(String operatorName, TypeInformation<R> outTypeInfo, OneInputStreamOperatorFactory<T,R> operatorFactory)
Method for passing user defined operators created by the given factory along with the type information that will transform the DataStream.DataStream<T>
union(DataStream<T>... streams)
Creates a newDataStream
by mergingDataStream
outputs of the same type with each other.<W extends Window>
AllWindowedStream<T,W>windowAll(WindowAssigner<? super T,W> assigner)
Windows this data stream to aAllWindowedStream
, which evaluates windows over a non key grouped stream.DataStreamSink<T>
writeToSocket(String hostName, int port, SerializationSchema<T> schema)
Writes the DataStream to a socket as a byte array.DataStreamSink<T>
writeUsingOutputFormat(OutputFormat<T> format)
Deprecated.Please use theStreamingFileSink
explicitly using theaddSink(SinkFunction)
method.
-
-
-
Field Detail
-
environment
protected final StreamExecutionEnvironment environment
-
transformation
protected final Transformation<T> transformation
-
-
Constructor Detail
-
DataStream
public DataStream(StreamExecutionEnvironment environment, Transformation<T> transformation)
Create a newDataStream
in the given execution environment with partitioning set to forward by default.- Parameters:
environment
- The StreamExecutionEnvironment
-
-
Method Detail
-
getId
@Internal public int getId()
Returns the ID of theDataStream
in the currentStreamExecutionEnvironment
.- Returns:
- ID of the DataStream
-
getParallelism
public int getParallelism()
Gets the parallelism for this operator.- Returns:
- The parallelism set for this operator.
-
getMinResources
@PublicEvolving public ResourceSpec getMinResources()
Gets the minimum resources for this operator.- Returns:
- The minimum resources set for this operator.
-
getPreferredResources
@PublicEvolving public ResourceSpec getPreferredResources()
Gets the preferred resources for this operator.- Returns:
- The preferred resources set for this operator.
-
getType
public TypeInformation<T> getType()
Gets the type of the stream.- Returns:
- The type of the datastream.
-
clean
protected <F> F clean(F f)
Invokes theClosureCleaner
on the given function if closure cleaning is enabled in theExecutionConfig
.- Returns:
- The cleaned Function
-
getExecutionEnvironment
public StreamExecutionEnvironment getExecutionEnvironment()
Returns theStreamExecutionEnvironment
that was used to create thisDataStream
.- Returns:
- The Execution Environment
-
getExecutionConfig
public ExecutionConfig getExecutionConfig()
-
union
@SafeVarargs public final DataStream<T> union(DataStream<T>... streams)
Creates a newDataStream
by mergingDataStream
outputs of the same type with each other. The DataStreams merged using this operator will be transformed simultaneously.- Parameters:
streams
- The DataStreams to union output with.- Returns:
- The
DataStream
.
-
connect
public <R> ConnectedStreams<T,R> connect(DataStream<R> dataStream)
Creates a newConnectedStreams
by connectingDataStream
outputs of (possible) different types with each other. The DataStreams connected using this operator can be used with CoFunctions to apply joint transformations.- Parameters:
dataStream
- The DataStream with which this stream will be connected.- Returns:
- The
ConnectedStreams
.
-
connect
@PublicEvolving public <R> BroadcastConnectedStream<T,R> connect(BroadcastStream<R> broadcastStream)
Creates a newBroadcastConnectedStream
by connecting the currentDataStream
orKeyedStream
with aBroadcastStream
.The latter can be created using the
broadcast(MapStateDescriptor[])
method.The resulting stream can be further processed using the
BroadcastConnectedStream.process(MyFunction)
method, whereMyFunction
can be either aKeyedBroadcastProcessFunction
or aBroadcastProcessFunction
depending on the current stream being aKeyedStream
or not.- Parameters:
broadcastStream
- The broadcast stream with the broadcast state to be connected with this stream.- Returns:
- The
BroadcastConnectedStream
.
-
keyBy
public <K> KeyedStream<T,K> keyBy(KeySelector<T,K> key)
It creates a newKeyedStream
that uses the provided key for partitioning its operator states.- Parameters:
key
- The KeySelector to be used for extracting the key for partitioning- Returns:
- The
DataStream
with partitioned state (i.e. KeyedStream)
-
keyBy
public <K> KeyedStream<T,K> keyBy(KeySelector<T,K> key, TypeInformation<K> keyType)
It creates a newKeyedStream
that uses the provided key with explicit type information for partitioning its operator states.- Parameters:
key
- The KeySelector to be used for extracting the key for partitioning.keyType
- The type information describing the key type.- Returns:
- The
DataStream
with partitioned state (i.e. KeyedStream)
-
keyBy
protected KeyedStream<T,Tuple> keyBy(Keys<T> keys)
-
partitionCustom
public <K> DataStream<T> partitionCustom(Partitioner<K> partitioner, KeySelector<T,K> keySelector)
Partitions a DataStream on the key returned by the selector, using a custom partitioner. This method takes the key selector to get the key to partition on, and a partitioner that accepts the key type.Note: This method works only on single field keys, i.e. the selector cannot return tuples of fields.
- Parameters:
partitioner
- The partitioner to assign partitions to keys.keySelector
- The KeySelector with which the DataStream is partitioned.- Returns:
- The partitioned DataStream.
- See Also:
KeySelector
-
broadcast
public DataStream<T> broadcast()
Sets the partitioning of theDataStream
so that the output elements are broadcasted to every parallel instance of the next operation.- Returns:
- The DataStream with broadcast partitioning set.
-
broadcast
@PublicEvolving public BroadcastStream<T> broadcast(MapStateDescriptor<?,?>... broadcastStateDescriptors)
Sets the partitioning of theDataStream
so that the output elements are broadcasted to every parallel instance of the next operation. In addition, it implicitly as manybroadcast states
as the specified descriptors which can be used to store the element of the stream.- Parameters:
broadcastStateDescriptors
- the descriptors of the broadcast states to create.- Returns:
- A
BroadcastStream
which can be used in theconnect(BroadcastStream)
to create aBroadcastConnectedStream
for further processing of the elements.
-
shuffle
@PublicEvolving public DataStream<T> shuffle()
Sets the partitioning of theDataStream
so that the output elements are shuffled uniformly randomly to the next operation.- Returns:
- The DataStream with shuffle partitioning set.
-
forward
public DataStream<T> forward()
Sets the partitioning of theDataStream
so that the output elements are forwarded to the local subtask of the next operation.- Returns:
- The DataStream with forward partitioning set.
-
rebalance
public DataStream<T> rebalance()
Sets the partitioning of theDataStream
so that the output elements are distributed evenly to instances of the next operation in a round-robin fashion.- Returns:
- The DataStream with rebalance partitioning set.
-
rescale
@PublicEvolving public DataStream<T> rescale()
Sets the partitioning of theDataStream
so that the output elements are distributed evenly to a subset of instances of the next operation in a round-robin fashion.The subset of downstream operations to which the upstream operation sends elements depends on the degree of parallelism of both the upstream and downstream operation. For example, if the upstream operation has parallelism 2 and the downstream operation has parallelism 4, then one upstream operation would distribute elements to two downstream operations while the other upstream operation would distribute to the other two downstream operations. If, on the other hand, the downstream operation has parallelism 2 while the upstream operation has parallelism 4 then two upstream operations will distribute to one downstream operation while the other two upstream operations will distribute to the other downstream operations.
In cases where the different parallelisms are not multiples of each other one or several downstream operations will have a differing number of inputs from upstream operations.
- Returns:
- The DataStream with rescale partitioning set.
-
global
@PublicEvolving public DataStream<T> global()
Sets the partitioning of theDataStream
so that the output values all go to the first instance of the next processing operator. Use this setting with care since it might cause a serious performance bottleneck in the application.- Returns:
- The DataStream with shuffle partitioning set.
-
map
public <R> SingleOutputStreamOperator<R> map(MapFunction<T,R> mapper)
Applies a Map transformation on aDataStream
. The transformation calls aMapFunction
for each element of the DataStream. Each MapFunction call returns exactly one element. The user can also extendRichMapFunction
to gain access to other features provided by theRichFunction
interface.- Type Parameters:
R
- output type- Parameters:
mapper
- The MapFunction that is called for each element of the DataStream.- Returns:
- The transformed
DataStream
.
-
map
public <R> SingleOutputStreamOperator<R> map(MapFunction<T,R> mapper, TypeInformation<R> outputType)
Applies a Map transformation on aDataStream
. The transformation calls aMapFunction
for each element of the DataStream. Each MapFunction call returns exactly one element. The user can also extendRichMapFunction
to gain access to other features provided by theRichFunction
interface.- Type Parameters:
R
- output type- Parameters:
mapper
- The MapFunction that is called for each element of the DataStream.outputType
-TypeInformation
for the result type of the function.- Returns:
- The transformed
DataStream
.
-
flatMap
public <R> SingleOutputStreamOperator<R> flatMap(FlatMapFunction<T,R> flatMapper)
Applies a FlatMap transformation on aDataStream
. The transformation calls aFlatMapFunction
for each element of the DataStream. Each FlatMapFunction call can return any number of elements including none. The user can also extendRichFlatMapFunction
to gain access to other features provided by theRichFunction
interface.- Type Parameters:
R
- output type- Parameters:
flatMapper
- The FlatMapFunction that is called for each element of the DataStream- Returns:
- The transformed
DataStream
.
-
flatMap
public <R> SingleOutputStreamOperator<R> flatMap(FlatMapFunction<T,R> flatMapper, TypeInformation<R> outputType)
Applies a FlatMap transformation on aDataStream
. The transformation calls aFlatMapFunction
for each element of the DataStream. Each FlatMapFunction call can return any number of elements including none. The user can also extendRichFlatMapFunction
to gain access to other features provided by theRichFunction
interface.- Type Parameters:
R
- output type- Parameters:
flatMapper
- The FlatMapFunction that is called for each element of the DataStreamoutputType
-TypeInformation
for the result type of the function.- Returns:
- The transformed
DataStream
.
-
process
@PublicEvolving public <R> SingleOutputStreamOperator<R> process(ProcessFunction<T,R> processFunction)
Applies the givenProcessFunction
on the input stream, thereby creating a transformed output stream.The function will be called for every element in the input streams and can produce zero or more output elements.
- Type Parameters:
R
- The type of elements emitted by theProcessFunction
.- Parameters:
processFunction
- TheProcessFunction
that is called for each element in the stream.- Returns:
- The transformed
DataStream
.
-
process
@Internal public <R> SingleOutputStreamOperator<R> process(ProcessFunction<T,R> processFunction, TypeInformation<R> outputType)
Applies the givenProcessFunction
on the input stream, thereby creating a transformed output stream.The function will be called for every element in the input streams and can produce zero or more output elements.
- Type Parameters:
R
- The type of elements emitted by theProcessFunction
.- Parameters:
processFunction
- TheProcessFunction
that is called for each element in the stream.outputType
-TypeInformation
for the result type of the function.- Returns:
- The transformed
DataStream
.
-
filter
public SingleOutputStreamOperator<T> filter(FilterFunction<T> filter)
Applies a Filter transformation on aDataStream
. The transformation calls aFilterFunction
for each element of the DataStream and retains only those element for which the function returns true. Elements for which the function returns false are filtered. The user can also extendRichFilterFunction
to gain access to other features provided by theRichFunction
interface.- Parameters:
filter
- The FilterFunction that is called for each element of the DataStream.- Returns:
- The filtered DataStream.
-
project
@PublicEvolving public <R extends Tuple> SingleOutputStreamOperator<R> project(int... fieldIndexes)
Initiates a Project transformation on aTuple
DataStream
.
Note: Only Tuple DataStreams can be projected.The transformation projects each Tuple of the DataSet onto a (sub)set of fields.
- Parameters:
fieldIndexes
- The field indexes of the input tuples that are retained. The order of fields in the output tuple corresponds to the order of field indexes.- Returns:
- The projected DataStream
- See Also:
Tuple
,DataStream
-
coGroup
public <T2> CoGroupedStreams<T,T2> coGroup(DataStream<T2> otherStream)
Creates a join operation. SeeCoGroupedStreams
for an example of how the keys and window can be specified.
-
join
public <T2> JoinedStreams<T,T2> join(DataStream<T2> otherStream)
Creates a join operation. SeeJoinedStreams
for an example of how the keys and window can be specified.
-
countWindowAll
public AllWindowedStream<T,GlobalWindow> countWindowAll(long size)
Windows thisDataStream
into tumbling count windows.Note: This operation is inherently non-parallel since all elements have to pass through the same operator instance.
- Parameters:
size
- The size of the windows in number of elements.
-
countWindowAll
public AllWindowedStream<T,GlobalWindow> countWindowAll(long size, long slide)
Windows thisDataStream
into sliding count windows.Note: This operation is inherently non-parallel since all elements have to pass through the same operator instance.
- Parameters:
size
- The size of the windows in number of elements.slide
- The slide interval in number of elements.
-
windowAll
@PublicEvolving public <W extends Window> AllWindowedStream<T,W> windowAll(WindowAssigner<? super T,W> assigner)
Windows this data stream to aAllWindowedStream
, which evaluates windows over a non key grouped stream. Elements are put into windows by aWindowAssigner
. The grouping of elements is done by window.A
Trigger
can be defined to specify when windows are evaluated. However,WindowAssigners
have a defaultTrigger
that is used if aTrigger
is not specified.Note: This operation is inherently non-parallel since all elements have to pass through the same operator instance.
- Parameters:
assigner
- TheWindowAssigner
that assigns elements to windows.- Returns:
- The trigger windows data stream.
-
assignTimestampsAndWatermarks
public SingleOutputStreamOperator<T> assignTimestampsAndWatermarks(WatermarkStrategy<T> watermarkStrategy)
Assigns timestamps to the elements in the data stream and generates watermarks to signal event time progress. The givenWatermarkStrategy
is used to create aTimestampAssigner
andWatermarkGenerator
.For each element in the data stream, the
TimestampAssigner.extractTimestamp(Object, long)
method is called to assign an event timestamp.For each event in the data stream, the
WatermarkGenerator.onEvent(Object, long, WatermarkOutput)
will be called.Periodically (defined by the
ExecutionConfig.getAutoWatermarkInterval()
), theWatermarkGenerator.onPeriodicEmit(WatermarkOutput)
method will be called.Common watermark generation patterns can be found as static methods in the
WatermarkStrategy
class.- Parameters:
watermarkStrategy
- The strategy to generate watermarks based on event timestamps.- Returns:
- The stream after the transformation, with assigned timestamps and watermarks.
-
print
@PublicEvolving public DataStreamSink<T> print()
Writes a DataStream to the standard output stream (stdout).For each element of the DataStream the result of
Object.toString()
is written.NOTE: This will print to stdout on the machine where the code is executed, i.e. the Flink worker.
- Returns:
- The closed DataStream.
-
printToErr
@PublicEvolving public DataStreamSink<T> printToErr()
Writes a DataStream to the standard error stream (stderr).For each element of the DataStream the result of
Object.toString()
is written.NOTE: This will print to stderr on the machine where the code is executed, i.e. the Flink worker.
- Returns:
- The closed DataStream.
-
print
@PublicEvolving public DataStreamSink<T> print(String sinkIdentifier)
Writes a DataStream to the standard output stream (stdout).For each element of the DataStream the result of
Object.toString()
is written.NOTE: This will print to stdout on the machine where the code is executed, i.e. the Flink worker.
- Parameters:
sinkIdentifier
- The string to prefix the output with.- Returns:
- The closed DataStream.
-
printToErr
@PublicEvolving public DataStreamSink<T> printToErr(String sinkIdentifier)
Writes a DataStream to the standard error stream (stderr).For each element of the DataStream the result of
Object.toString()
is written.NOTE: This will print to stderr on the machine where the code is executed, i.e. the Flink worker.
- Parameters:
sinkIdentifier
- The string to prefix the output with.- Returns:
- The closed DataStream.
-
writeToSocket
@PublicEvolving public DataStreamSink<T> writeToSocket(String hostName, int port, SerializationSchema<T> schema)
Writes the DataStream to a socket as a byte array. The format of the output is specified by aSerializationSchema
.- Parameters:
hostName
- host of the socketport
- port of the socketschema
- schema for serialization- Returns:
- the closed DataStream
-
writeUsingOutputFormat
@Deprecated @PublicEvolving public DataStreamSink<T> writeUsingOutputFormat(OutputFormat<T> format)
Deprecated.Please use theStreamingFileSink
explicitly using theaddSink(SinkFunction)
method.Writes the dataStream into an output, described by an OutputFormat.The output is not participating in Flink's checkpointing!
For writing to a file system periodically, the use of the
StreamingFileSink
is recommended.- Parameters:
format
- The output format- Returns:
- The closed DataStream
-
transform
@PublicEvolving public <R> SingleOutputStreamOperator<R> transform(String operatorName, TypeInformation<R> outTypeInfo, OneInputStreamOperator<T,R> operator)
Method for passing user defined operators along with the type information that will transform the DataStream.- Type Parameters:
R
- type of the return stream- Parameters:
operatorName
- name of the operator, for logging purposesoutTypeInfo
- the output type of the operatoroperator
- the object containing the transformation logic- Returns:
- the data stream constructed
- See Also:
transform(String, TypeInformation, OneInputStreamOperatorFactory)
-
transform
@PublicEvolving public <R> SingleOutputStreamOperator<R> transform(String operatorName, TypeInformation<R> outTypeInfo, OneInputStreamOperatorFactory<T,R> operatorFactory)
Method for passing user defined operators created by the given factory along with the type information that will transform the DataStream.This method uses the rather new operator factories and should only be used when custom factories are needed.
- Type Parameters:
R
- type of the return stream- Parameters:
operatorName
- name of the operator, for logging purposesoutTypeInfo
- the output type of the operatoroperatorFactory
- the factory for the operator.- Returns:
- the data stream constructed.
-
doTransform
protected <R> SingleOutputStreamOperator<R> doTransform(String operatorName, TypeInformation<R> outTypeInfo, StreamOperatorFactory<R> operatorFactory)
-
setConnectionType
protected DataStream<T> setConnectionType(StreamPartitioner<T> partitioner)
Internal function for setting the partitioner for the DataStream.- Parameters:
partitioner
- Partitioner to set.- Returns:
- The modified DataStream.
-
addSink
public DataStreamSink<T> addSink(SinkFunction<T> sinkFunction)
Adds the given sink to this DataStream. Only streams with sinks added will be executed once theStreamExecutionEnvironment.execute()
method is called.- Parameters:
sinkFunction
- The object containing the sink's invoke function.- Returns:
- The closed DataStream.
-
sinkTo
@PublicEvolving public DataStreamSink<T> sinkTo(Sink<T> sink)
Adds the givenSink
to this DataStream. Only streams with sinks added will be executed once theStreamExecutionEnvironment.execute()
method is called.- Parameters:
sink
- The user defined sink.- Returns:
- The closed DataStream.
-
sinkTo
@PublicEvolving public DataStreamSink<T> sinkTo(Sink<T> sink, CustomSinkOperatorUidHashes customSinkOperatorUidHashes)
Adds the givenSink
to this DataStream. Only streams with sinks added will be executed once theStreamExecutionEnvironment.execute()
method is called.This method is intended to be used only to recover a snapshot where no uids have been set before taking the snapshot.
- Parameters:
customSinkOperatorUidHashes
- operator hashes to support state bindingsink
- The user defined sink.- Returns:
- The closed DataStream.
-
executeAndCollect
public CloseableIterator<T> executeAndCollect() throws Exception
Triggers the distributed execution of the streaming dataflow and returns an iterator over the elements of the given DataStream.The DataStream application is executed in the regular distributed manner on the target environment, and the events from the stream are polled back to this application process and thread through Flink's REST API.
IMPORTANT The returned iterator must be closed to free all cluster resources.
- Throws:
Exception
-
executeAndCollect
public CloseableIterator<T> executeAndCollect(String jobExecutionName) throws Exception
Triggers the distributed execution of the streaming dataflow and returns an iterator over the elements of the given DataStream.The DataStream application is executed in the regular distributed manner on the target environment, and the events from the stream are polled back to this application process and thread through Flink's REST API.
IMPORTANT The returned iterator must be closed to free all cluster resources.
- Throws:
Exception
-
executeAndCollect
public List<T> executeAndCollect(int limit) throws Exception
Triggers the distributed execution of the streaming dataflow and returns an iterator over the elements of the given DataStream.The DataStream application is executed in the regular distributed manner on the target environment, and the events from the stream are polled back to this application process and thread through Flink's REST API.
- Throws:
Exception
-
executeAndCollect
public List<T> executeAndCollect(String jobExecutionName, int limit) throws Exception
Triggers the distributed execution of the streaming dataflow and returns an iterator over the elements of the given DataStream.The DataStream application is executed in the regular distributed manner on the target environment, and the events from the stream are polled back to this application process and thread through Flink's REST API.
- Throws:
Exception
-
collectAsync
@Experimental public CloseableIterator<T> collectAsync()
Sets up the collection of the elements in thisDataStream
, and returns an iterator over the collected elements that can be used to retrieve elements once the job execution has started.Caution: When multiple streams are being collected it is recommended to consume all streams in parallel to not back-pressure the job.
Caution: Closing the returned iterator cancels the job! It is recommended to close all iterators once you are no longer interested in any of the collected streams.
This method is functionally equivalent to
collectAsync(Collector)
.- Returns:
- iterator over the contained elements
-
collectAsync
@Experimental public void collectAsync(DataStream.Collector<T> collector)
Sets up the collection of the elements in thisDataStream
, which can be retrieved later via the givenDataStream.Collector
.Caution: When multiple streams are being collected it is recommended to consume all streams in parallel to not back-pressure the job.
Caution: Closing the iterator from the collector cancels the job! It is recommended to close all iterators once you are no longer interested in any of the collected streams.
This method is functionally equivalent to
collectAsync()
.This method is meant to support use-cases where the application of a sink is done via a
Consumer<DataStream<T>>
, where it wouldn't be possible (or inconvenient) to return an iterator.- Parameters:
collector
- a collector that can be used to retrieve the elements
-
fullWindowPartition
@PublicEvolving public PartitionWindowedStream<T> fullWindowPartition()
Collect records from each partition into a separate full window. The window emission will be triggered at the end of inputs. For this non-keyed data stream(each record has no key), a partition contains all records of a subtask.- Returns:
- The full windowed data stream on partition.
-
getTransformation
@Internal public Transformation<T> getTransformation()
Returns theTransformation
that represents the operation that logically creates thisDataStream
.- Returns:
- The Transformation
-
-