@PublicEvolving public class PythonDataStream<D extends DataStream<org.python.core.PyObject>> extends Object
PythonDataStream
is a thin wrapper layer over DataStream
, which represents a
stream of elements of the same type. A PythonDataStream
can be transformed into
another PythonDataStream
by applying various transformation functions, such as
map(org.apache.flink.api.common.functions.MapFunction<org.python.core.PyObject, org.python.core.PyObject>)
split(org.apache.flink.streaming.api.collector.selector.OutputSelector<org.python.core.PyObject>)
A thin wrapper layer means that the functionality itself is performed by the
DataStream
, however instead of working directly with the streaming data sets,
this layer handles Python wrappers (e.g. PythonDataStream
) to comply with the
Python standard coding styles.
Constructor and Description |
---|
PythonDataStream(D stream) |
protected final D extends DataStream<org.python.core.PyObject> stream
public PythonDataStream(D stream)
@SafeVarargs public final PythonDataStream union(PythonDataStream... streams)
DataStream.union(DataStream[])
.streams
- The Python DataStreams to union output with.PythonDataStream
.public PythonSplitStream split(OutputSelector<org.python.core.PyObject> output_selector) throws IOException
DataStream.split(OutputSelector)
.output_selector
- The user defined OutputSelector
for directing the tuples.PythonSplitStream
IOException
public PythonSingleOutputStreamOperator filter(FilterFunction<org.python.core.PyObject> filter) throws IOException
DataStream.filter(FilterFunction)
.filter
- The FilterFunction that is called for each element of the DataStream.PythonDataStream
.IOException
public PythonDataStream<SingleOutputStreamOperator<org.python.core.PyObject>> map(MapFunction<org.python.core.PyObject,org.python.core.PyObject> mapper) throws IOException
DataStream.map(MapFunction)
.mapper
- The MapFunction that is called for each element of the
DataStream.PythonDataStream
.IOException
public PythonDataStream<SingleOutputStreamOperator<org.python.core.PyObject>> flat_map(FlatMapFunction<org.python.core.PyObject,Object> flat_mapper) throws IOException
DataStream.flatMap(FlatMapFunction)
.flat_mapper
- The FlatMapFunction that is called for each element of the
DataStreamPythonDataStream
.IOException
public PythonKeyedStream key_by(KeySelector<org.python.core.PyObject,PyKey> selector) throws IOException
DataStream.keyBy(KeySelector)
.selector
- The KeySelector to be used for extracting the key for partitioningPythonDataStream
with partitioned state (i.e. PythonKeyedStream
)IOException
@PublicEvolving public void output()
DataStream.print()
.@PublicEvolving public void write_as_text(String path)
DataStream.writeAsText(java.lang.String)
.path
- The path pointing to the location the text file is written to.@PublicEvolving public void write_as_text(String path, FileSystem.WriteMode mode)
DataStream#writeAsText(java.lang.String, WriteMode)
.path
- The path pointing to the location the text file is written tomode
- Controls the behavior for existing files. Options are
NO_OVERWRITE and OVERWRITE.@PublicEvolving public void write_to_socket(String host, Integer port, SerializationSchema<org.python.core.PyObject> schema) throws IOException
DataStream.writeToSocket(String, int, org.apache.flink.api.common.serialization.SerializationSchema)
.host
- host of the socketport
- port of the socketschema
- schema for serializationIOException
@PublicEvolving public void add_sink(SinkFunction<org.python.core.PyObject> sink_func) throws IOException
DataStream.addSink(SinkFunction)
.sink_func
- The object containing the sink's invoke function.IOException
@PublicEvolving public PythonIterativeStream iterate()
DataStream.iterate()
.
Initiates an iterative part of the program that feeds back data streams.
The iterative part needs to be closed by calling
PythonIterativeStream.close_with(PythonDataStream)
. The transformation of
this IterativeStream will be the iteration head. The data stream
given to the PythonIterativeStream.close_with(PythonDataStream)
method is
the data stream that will be fed back and used as the input for the
iteration head.
A common usage pattern for streaming iterations is to use output
splitting to send a part of the closing data stream to the head. Refer to
split(OutputSelector)
for more information.
The iteration edge will be partitioned the same way as the first input of
the iteration head unless it is changed in the
PythonIterativeStream.close_with(PythonDataStream)
call.
By default a PythonDataStream with iteration will never terminate, but the user can use the maxWaitTime parameter to set a max waiting time for the iteration head. If no data received in the set time, the stream terminates.
@PublicEvolving public PythonIterativeStream iterate(Long max_wait_time_ms)
DataStream.iterate(long)
.
Initiates an iterative part of the program that feeds back data streams.
The iterative part needs to be closed by calling
PythonIterativeStream.close_with(PythonDataStream)
. The transformation of
this IterativeStream will be the iteration head. The data stream
given to the PythonIterativeStream.close_with(PythonDataStream)
method is
the data stream that will be fed back and used as the input for the
iteration head.
A common usage pattern for streaming iterations is to use output
splitting to send a part of the closing data stream to the head. Refer to
split(OutputSelector)
for more information.
The iteration edge will be partitioned the same way as the first input of
the iteration head unless it is changed in the
PythonIterativeStream.close_with(PythonDataStream)
call.
By default a PythonDataStream with iteration will never terminate, but the user can use the maxWaitTime parameter to set a max waiting time for the iteration head. If no data received in the set time, the stream terminates.
max_wait_time_ms
- Number of milliseconds to wait between inputs before shutting
downCopyright © 2014–2019 The Apache Software Foundation. All rights reserved.