.. ################################################################################ Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. ################################################################################ ========== DataStream ========== DataStream ---------- A DataStream represents a stream of elements of the same type. .. currentmodule:: pyflink.datastream.data_stream .. autosummary:: :toctree: api/ DataStream.get_name DataStream.name DataStream.uid DataStream.set_uid_hash DataStream.set_parallelism DataStream.set_max_parallelism DataStream.get_type DataStream.get_execution_environment DataStream.force_non_parallel DataStream.set_buffer_timeout DataStream.start_new_chain DataStream.disable_chaining DataStream.slot_sharing_group DataStream.set_description DataStream.map DataStream.flat_map DataStream.key_by DataStream.filter DataStream.window_all DataStream.union DataStream.connect DataStream.shuffle DataStream.project DataStream.rescale DataStream.rebalance DataStream.forward DataStream.broadcast DataStream.process DataStream.assign_timestamps_and_watermarks DataStream.partition_custom DataStream.add_sink DataStream.sink_to DataStream.execute_and_collect DataStream.print DataStream.get_side_output DataStream.cache DataStreamSink -------------- A Stream Sink. This is used for emitting elements from a streaming topology. .. currentmodule:: pyflink.datastream.data_stream .. autosummary:: :toctree: api/ DataStreamSink.name DataStreamSink.uid DataStreamSink.set_uid_hash DataStreamSink.set_parallelism DataStreamSink.set_description DataStreamSink.disable_chaining DataStreamSink.slot_sharing_group KeyedStream ----------- A Stream Sink. This is used for emitting elements from a streaming topology. .. currentmodule:: pyflink.datastream.data_stream .. autosummary:: :toctree: api/ KeyedStream.map KeyedStream.flat_map KeyedStream.reduce KeyedStream.filter KeyedStream.sum KeyedStream.min KeyedStream.max KeyedStream.min_by KeyedStream.max_by KeyedStream.add_sink KeyedStream.key_by KeyedStream.process KeyedStream.window KeyedStream.count_window KeyedStream.union KeyedStream.connect KeyedStream.partition_custom KeyedStream.print CachedDataStream ---------------- CachedDataStream represents a DataStream whose intermediate result will be cached at the first time when it is computed. And the cached intermediate result can be used in later job that using the same CachedDataStream to avoid re-computing the intermediate result. .. currentmodule:: pyflink.datastream.data_stream .. autosummary:: :toctree: api/ CachedDataStream.get_type CachedDataStream.get_execution_environment CachedDataStream.set_description CachedDataStream.map CachedDataStream.flat_map CachedDataStream.key_by CachedDataStream.filter CachedDataStream.window_all CachedDataStream.union CachedDataStream.connect CachedDataStream.shuffle CachedDataStream.project CachedDataStream.rescale CachedDataStream.rebalance CachedDataStream.forward CachedDataStream.broadcast CachedDataStream.process CachedDataStream.assign_timestamps_and_watermarks CachedDataStream.partition_custom CachedDataStream.add_sink CachedDataStream.sink_to CachedDataStream.execute_and_collect CachedDataStream.print CachedDataStream.get_side_output CachedDataStream.cache CachedDataStream.invalidate WindowedStream -------------- A WindowedStream represents a data stream where elements are grouped by key, and for each key, the stream of elements is split into windows based on a WindowAssigner. Window emission is triggered based on a Trigger. The windows are conceptually evaluated for each key individually, meaning windows can trigger at different points for each key. Note that the WindowedStream is purely an API construct, during runtime the WindowedStream will be collapsed together with the KeyedStream and the operation over the window into one single operation. .. currentmodule:: pyflink.datastream.data_stream .. autosummary:: :toctree: api/ WindowedStream.get_execution_environment WindowedStream.get_input_type WindowedStream.trigger WindowedStream.allowed_lateness WindowedStream.side_output_late_data WindowedStream.reduce WindowedStream.aggregate WindowedStream.apply WindowedStream.process AllWindowedStream ----------------- A AllWindowedStream represents a data stream where the stream of elements is split into windows based on a WindowAssigner. Window emission is triggered based on a Trigger. If an Evictor is specified it will be used to evict elements from the window after evaluation was triggered by the Trigger but before the actual evaluation of the window. When using an evictor, window performance will degrade significantly, since pre-aggregation of window results cannot be used. Note that the AllWindowedStream is purely an API construct, during runtime the AllWindowedStream will be collapsed together with the operation over the window into one single operation. .. currentmodule:: pyflink.datastream.data_stream .. autosummary:: :toctree: api/ AllWindowedStream.get_execution_environment AllWindowedStream.get_input_type AllWindowedStream.trigger AllWindowedStream.allowed_lateness AllWindowedStream.side_output_late_data AllWindowedStream.reduce AllWindowedStream.aggregate AllWindowedStream.apply AllWindowedStream.process ConnectedStreams ---------------- ConnectedStreams represent two connected streams of (possibly) different data types. Connected streams are useful for cases where operations on one stream directly affect the operations on the other stream, usually via shared state between the streams. An example for the use of connected streams would be to apply rules that change over time onto another stream. One of the connected streams has the rules, the other stream the elements to apply the rules to. The operation on the connected stream maintains the current set of rules in the state. It may receive either a rule update and update the state or a data element and apply the rules in the state to the element. The connected stream can be conceptually viewed as a union stream of an Either type, that holds either the first stream's type or the second stream's type. .. currentmodule:: pyflink.datastream.data_stream .. autosummary:: :toctree: api/ ConnectedStreams.key_by ConnectedStreams.map ConnectedStreams.flat_map ConnectedStreams.process BroadcastStream --------------- .. currentmodule:: pyflink.datastream.data_stream .. autosummary:: :toctree: api/ BroadcastStream BroadcastConnectedStream ------------------------ A BroadcastConnectedStream represents the result of connecting a keyed or non-keyed stream, with a :class:`BroadcastStream` with :class:`~state.BroadcastState` (s). As in the case of :class:`ConnectedStreams` these streams are useful for cases where operations on one stream directly affect the operations on the other stream, usually via shared state between the streams. An example for the use of such connected streams would be to apply rules that change over time onto another, possibly keyed stream. The stream with the broadcast state has the rules, and will store them in the broadcast state, while the other stream will contain the elements to apply the rules to. By broadcasting the rules, these will be available in all parallel instances, and can be applied to all partitions of the other stream. .. currentmodule:: pyflink.datastream.data_stream .. autosummary:: :toctree: api/ BroadcastConnectedStream.process