DataStream#

DataStream#

A DataStream represents a stream of elements of the same type.

DataStream.get_name()

Gets the name of the current data stream.

DataStream.name(name)

Sets the name of the current data stream.

DataStream.uid(uid)

Sets an ID for this operator.

DataStream.set_uid_hash(uid_hash)

Sets an user provided hash for this operator.

DataStream.set_parallelism(parallelism)

Sets the parallelism for this operator.

DataStream.set_max_parallelism(max_parallelism)

Sets the maximum parallelism of this operator.

DataStream.get_type()

Gets the type of the stream.

DataStream.get_execution_environment()

Returns the StreamExecutionEnvironment that was used to create this DataStream.

DataStream.force_non_parallel()

Sets the parallelism and maximum parallelism of this operator to one.

DataStream.set_buffer_timeout(timeout_millis)

Sets the buffering timeout for data produced by this operation.

DataStream.start_new_chain()

Starts a new task chain beginning at this operator.

DataStream.disable_chaining()

Turns off chaining for this operator so thread co-location will not be used as an optimization.

DataStream.slot_sharing_group(slot_sharing_group)

Sets the slot sharing group of this operation.

DataStream.set_description(description)

Sets the description for this operator.

DataStream.map(func[, output_type])

Applies a Map transformation on a DataStream.

DataStream.flat_map(func[, output_type])

Applies a FlatMap transformation on a DataStream.

DataStream.key_by(key_selector[, key_type])

Creates a new KeyedStream that uses the provided key for partitioning its operator states.

DataStream.filter(func)

Applies a Filter transformation on a DataStream.

DataStream.window_all(window_assigner)

Windows this data stream to a AllWindowedStream, which evaluates windows over a non key grouped stream.

DataStream.union(*streams)

Creates a new DataStream by merging DataStream outputs of the same type with each other.

DataStream.connect()

If ds is a DataStream, creates a new ConnectedStreams by connecting DataStream outputs of (possible) different types with each other.

DataStream.shuffle()

Sets the partitioning of the DataStream so that the output elements are shuffled uniformly randomly to the next operation.

DataStream.project(*field_indexes)

Initiates a Project transformation on a Tuple DataStream.

DataStream.rescale()

Sets the partitioning of the DataStream so that the output elements are distributed evenly to a subset of instances of the next operation in a round-robin fashion.

DataStream.rebalance()

Sets the partitioning of the DataStream so that the output elements are distributed evenly to instances of the next operation in a round-robin fashion.

DataStream.forward()

Sets the partitioning of the DataStream so that the output elements are forwarded to the local sub-task of the next operation.

DataStream.broadcast()

Sets the partitioning of the DataStream so that the output elements are broadcasted to every parallel instance of the next operation.

DataStream.process(func[, output_type])

Applies the given ProcessFunction on the input stream, thereby creating a transformed output stream.

DataStream.assign_timestamps_and_watermarks(...)

Assigns timestamps to the elements in the data stream and generates watermarks to signal event time progress.

DataStream.partition_custom(partitioner, ...)

Partitions a DataStream on the key returned by the selector, using a custom partitioner.

DataStream.add_sink(sink_func)

Adds the given sink to this DataStream.

DataStream.sink_to(sink)

Adds the given sink to this DataStream.

DataStream.execute_and_collect([...])

Triggers the distributed execution of the streaming dataflow and returns an iterator over the elements of the given DataStream.

DataStream.print([sink_identifier])

Writes a DataStream to the standard output stream (stdout).

DataStream.get_side_output(output_tag)

Gets the DataStream that contains the elements that are emitted from an operation into the side output with the given OutputTag.

DataStream.cache()

Cache the intermediate result of the transformation.

DataStreamSink#

A Stream Sink. This is used for emitting elements from a streaming topology.

DataStreamSink.name(name)

Sets the name of this sink.

DataStreamSink.uid(uid)

Sets an ID for this operator.

DataStreamSink.set_uid_hash(uid_hash)

Sets an user provided hash for this operator.

DataStreamSink.set_parallelism(parallelism)

Sets the parallelism for this operator.

DataStreamSink.set_description(description)

Sets the description for this sink.

DataStreamSink.disable_chaining()

Turns off chaining for this operator so thread co-location will not be used as an optimization.

DataStreamSink.slot_sharing_group(...)

Sets the slot sharing group of this operation.

KeyedStream#

A Stream Sink. This is used for emitting elements from a streaming topology.

KeyedStream.map(func[, output_type])

Applies a Map transformation on a KeyedStream.

KeyedStream.flat_map(func[, output_type])

Applies a FlatMap transformation on a KeyedStream.

KeyedStream.reduce(func)

Applies a reduce transformation on the grouped data stream grouped on by the given key position.

KeyedStream.filter(func)

Applies a Filter transformation on a DataStream.

KeyedStream.sum([position_to_sum])

Applies an aggregation that gives a rolling sum of the data stream at the given position grouped by the given key.

KeyedStream.min([position_to_min])

Applies an aggregation that gives the current minimum of the data stream at the given position by the given key.

KeyedStream.max([position_to_max])

Applies an aggregation that gives the current maximize of the data stream at the given position by the given key.

KeyedStream.min_by([position_to_min_by])

Applies an aggregation that gives the current element with the minimum value at the given position by the given key.

KeyedStream.max_by([position_to_max_by])

Applies an aggregation that gives the current element with the maximize value at the given position by the given key.

KeyedStream.add_sink(sink_func)

Adds the given sink to this DataStream.

KeyedStream.key_by(key_selector[, key_type])

Creates a new KeyedStream that uses the provided key for partitioning its operator states.

KeyedStream.process(func[, output_type])

Applies the given ProcessFunction on the input stream, thereby creating a transformed output stream.

KeyedStream.window(window_assigner)

Windows this data stream to a WindowedStream, which evaluates windows over a key grouped stream.

KeyedStream.count_window(size[, slide])

Windows this KeyedStream into tumbling or sliding count windows.

KeyedStream.union(*streams)

Creates a new DataStream by merging DataStream outputs of the same type with each other.

KeyedStream.connect()

If ds is a DataStream, creates a new ConnectedStreams by connecting DataStream outputs of (possible) different types with each other.

KeyedStream.partition_custom(partitioner, ...)

Partitions a DataStream on the key returned by the selector, using a custom partitioner.

KeyedStream.print([sink_identifier])

Writes a DataStream to the standard output stream (stdout).

CachedDataStream#

CachedDataStream represents a DataStream whose intermediate result will be cached at the first time when it is computed. And the cached intermediate result can be used in later job that using the same CachedDataStream to avoid re-computing the intermediate result.

CachedDataStream.get_type()

Gets the type of the stream.

CachedDataStream.get_execution_environment()

Returns the StreamExecutionEnvironment that was used to create this DataStream.

CachedDataStream.set_description(description)

Sets the description for this operator.

CachedDataStream.map(func[, output_type])

Applies a Map transformation on a DataStream.

CachedDataStream.flat_map(func[, output_type])

Applies a FlatMap transformation on a DataStream.

CachedDataStream.key_by(key_selector[, key_type])

Creates a new KeyedStream that uses the provided key for partitioning its operator states.

CachedDataStream.filter(func)

Applies a Filter transformation on a DataStream.

CachedDataStream.window_all(window_assigner)

Windows this data stream to a AllWindowedStream, which evaluates windows over a non key grouped stream.

CachedDataStream.union(*streams)

Creates a new DataStream by merging DataStream outputs of the same type with each other.

CachedDataStream.connect(ds)

If ds is a DataStream, creates a new ConnectedStreams by connecting DataStream outputs of (possible) different types with each other.

CachedDataStream.shuffle()

Sets the partitioning of the DataStream so that the output elements are shuffled uniformly randomly to the next operation.

CachedDataStream.project(*field_indexes)

Initiates a Project transformation on a Tuple DataStream.

CachedDataStream.rescale()

Sets the partitioning of the DataStream so that the output elements are distributed evenly to a subset of instances of the next operation in a round-robin fashion.

CachedDataStream.rebalance()

Sets the partitioning of the DataStream so that the output elements are distributed evenly to instances of the next operation in a round-robin fashion.

CachedDataStream.forward()

Sets the partitioning of the DataStream so that the output elements are forwarded to the local sub-task of the next operation.

CachedDataStream.broadcast([...])

Sets the partitioning of the DataStream so that the output elements are broadcasted to every parallel instance of the next operation.

CachedDataStream.process(func[, output_type])

Applies the given ProcessFunction on the input stream, thereby creating a transformed output stream.

CachedDataStream.assign_timestamps_and_watermarks(...)

Assigns timestamps to the elements in the data stream and generates watermarks to signal event time progress.

CachedDataStream.partition_custom(...)

Partitions a DataStream on the key returned by the selector, using a custom partitioner.

CachedDataStream.add_sink(sink_func)

Adds the given sink to this DataStream.

CachedDataStream.sink_to(sink)

Adds the given sink to this DataStream.

CachedDataStream.execute_and_collect([...])

Triggers the distributed execution of the streaming dataflow and returns an iterator over the elements of the given DataStream.

CachedDataStream.print([sink_identifier])

Writes a DataStream to the standard output stream (stdout).

CachedDataStream.get_side_output(output_tag)

Gets the DataStream that contains the elements that are emitted from an operation into the side output with the given OutputTag.

CachedDataStream.cache()

Cache the intermediate result of the transformation.

CachedDataStream.invalidate()

Invalidate the cache intermediate result of this DataStream to release the physical resources.

WindowedStream#

A WindowedStream represents a data stream where elements are grouped by key, and for each key, the stream of elements is split into windows based on a WindowAssigner. Window emission is triggered based on a Trigger.

The windows are conceptually evaluated for each key individually, meaning windows can trigger at different points for each key.

Note that the WindowedStream is purely an API construct, during runtime the WindowedStream will be collapsed together with the KeyedStream and the operation over the window into one single operation.

WindowedStream.get_execution_environment()

WindowedStream.get_input_type()

WindowedStream.trigger(trigger)

Sets the Trigger that should be used to trigger window emission.

WindowedStream.allowed_lateness(time_ms)

Sets the time by which elements are allowed to be late.

WindowedStream.side_output_late_data(output_tag)

Send late arriving data to the side output identified by the given OutputTag.

WindowedStream.reduce(reduce_function[, ...])

Applies a reduce function to the window.

WindowedStream.aggregate(aggregate_function)

Applies the given window function to each window.

WindowedStream.apply(window_function[, ...])

Applies the given window function to each window.

WindowedStream.process(process_window_function)

Applies the given window function to each window.

AllWindowedStream#

A AllWindowedStream represents a data stream where the stream of elements is split into windows based on a WindowAssigner. Window emission is triggered based on a Trigger.

If an Evictor is specified it will be used to evict elements from the window after evaluation was triggered by the Trigger but before the actual evaluation of the window. When using an evictor, window performance will degrade significantly, since pre-aggregation of window results cannot be used.

Note that the AllWindowedStream is purely an API construct, during runtime the AllWindowedStream will be collapsed together with the operation over the window into one single operation.

AllWindowedStream.get_execution_environment()

AllWindowedStream.get_input_type()

AllWindowedStream.trigger(trigger)

Sets the Trigger that should be used to trigger window emission.

AllWindowedStream.allowed_lateness(time_ms)

Sets the time by which elements are allowed to be late.

AllWindowedStream.side_output_late_data(...)

Send late arriving data to the side output identified by the given OutputTag.

AllWindowedStream.reduce(reduce_function[, ...])

Applies the given window function to each window.

AllWindowedStream.aggregate(aggregate_function)

Applies the given window function to each window.

AllWindowedStream.apply(window_function[, ...])

Applies the given window function to each window.

AllWindowedStream.process(...[, output_type])

Applies the given window function to each window.

ConnectedStreams#

ConnectedStreams represent two connected streams of (possibly) different data types. Connected streams are useful for cases where operations on one stream directly affect the operations on the other stream, usually via shared state between the streams.

An example for the use of connected streams would be to apply rules that change over time onto another stream. One of the connected streams has the rules, the other stream the elements to apply the rules to. The operation on the connected stream maintains the current set of rules in the state. It may receive either a rule update and update the state or a data element and apply the rules in the state to the element.

The connected stream can be conceptually viewed as a union stream of an Either type, that holds either the first stream’s type or the second stream’s type.

ConnectedStreams.key_by(key_selector1, ...)

KeyBy operation for connected data stream.

ConnectedStreams.map(func[, output_type])

Applies a CoMap transformation on a ConnectedStreams and maps the output to a common type.

ConnectedStreams.flat_map(func[, output_type])

Applies a CoFlatMap transformation on a ConnectedStreams and maps the output to a common type.

ConnectedStreams.process(func[, output_type])

BroadcastStream#

BroadcastStream(input_stream, ...)

A BroadcastStream is a stream with state.BroadcastState (s).

BroadcastConnectedStream#

A BroadcastConnectedStream represents the result of connecting a keyed or non-keyed stream, with a BroadcastStream with BroadcastState (s). As in the case of ConnectedStreams these streams are useful for cases where operations on one stream directly affect the operations on the other stream, usually via shared state between the streams.

An example for the use of such connected streams would be to apply rules that change over time onto another, possibly keyed stream. The stream with the broadcast state has the rules, and will store them in the broadcast state, while the other stream will contain the elements to apply the rules to. By broadcasting the rules, these will be available in all parallel instances, and can be applied to all partitions of the other stream.

BroadcastConnectedStream.process()

Assumes as inputs a BroadcastStream and a DataStream or KeyedStream and applies the given BroadcastProcessFunction or KeyedBroadcastProcessFunction on them, thereby creating a transformed output stream.