T
- The type of the elements in this stream.@Public public class SingleOutputStreamOperator<T> extends DataStream<T>
SingleOutputStreamOperator
represents a user defined transformation applied on a DataStream
with one predefined output type.DataStream.Collector<T>
Modifier and Type | Field and Description |
---|---|
protected boolean |
nonParallel
Indicate this is a non-parallel operator and cannot set a non-1 degree of parallelism.
|
environment, transformation
Modifier | Constructor and Description |
---|---|
protected |
SingleOutputStreamOperator(StreamExecutionEnvironment environment,
Transformation<T> transformation) |
Modifier and Type | Method and Description |
---|---|
CachedDataStream<T> |
cache()
Cache the intermediate result of the transformation.
|
SingleOutputStreamOperator<T> |
disableChaining()
Turns off chaining for this operator so thread co-location will not be used as an
optimization.
|
SingleOutputStreamOperator<T> |
forceNonParallel()
Sets the parallelism and maximum parallelism of this operator to one.
|
String |
getName()
Gets the name of the current data stream.
|
<X> SideOutputDataStream<X> |
getSideOutput(OutputTag<X> sideOutputTag)
Gets the
DataStream that contains the elements that are emitted from an operation
into the side output with the given OutputTag . |
SingleOutputStreamOperator<T> |
name(String name)
Sets the name of the current data stream.
|
SingleOutputStreamOperator<T> |
returns(Class<T> typeClass)
Adds a type information hint about the return type of this operator.
|
SingleOutputStreamOperator<T> |
returns(TypeHint<T> typeHint)
Adds a type information hint about the return type of this operator.
|
SingleOutputStreamOperator<T> |
returns(TypeInformation<T> typeInfo)
Adds a type information hint about the return type of this operator.
|
SingleOutputStreamOperator<T> |
setBufferTimeout(long timeoutMillis)
Sets the buffering timeout for data produced by this operation.
|
SingleOutputStreamOperator<T> |
setDescription(String description)
Sets the description for this operation.
|
SingleOutputStreamOperator<T> |
setMaxParallelism(int maxParallelism)
Sets the maximum parallelism of this operator.
|
SingleOutputStreamOperator<T> |
setParallelism(int parallelism)
Sets the parallelism for this operator.
|
SingleOutputStreamOperator<T> |
setUidHash(String uidHash)
Sets an user provided hash for this operator.
|
SingleOutputStreamOperator<T> |
slotSharingGroup(SlotSharingGroup slotSharingGroup)
Sets the slot sharing group of this operation.
|
SingleOutputStreamOperator<T> |
slotSharingGroup(String slotSharingGroup)
Sets the slot sharing group of this operation.
|
SingleOutputStreamOperator<T> |
startNewChain()
Starts a new task chain beginning at this operator.
|
SingleOutputStreamOperator<T> |
uid(String uid)
Sets an ID for this operator.
|
addSink, assignTimestampsAndWatermarks, assignTimestampsAndWatermarks, assignTimestampsAndWatermarks, broadcast, broadcast, clean, coGroup, collectAsync, collectAsync, connect, connect, countWindowAll, countWindowAll, doTransform, executeAndCollect, executeAndCollect, executeAndCollect, executeAndCollect, filter, flatMap, flatMap, forward, getExecutionConfig, getExecutionEnvironment, getId, getMinResources, getParallelism, getPreferredResources, getTransformation, getType, global, iterate, iterate, join, keyBy, keyBy, keyBy, keyBy, map, map, partitionCustom, partitionCustom, partitionCustom, print, print, printToErr, printToErr, process, process, project, rebalance, rescale, setConnectionType, shuffle, sinkTo, sinkTo, sinkTo, sinkTo, timeWindowAll, timeWindowAll, transform, transform, union, windowAll, writeAsCsv, writeAsCsv, writeAsCsv, writeAsText, writeAsText, writeToSocket, writeUsingOutputFormat
protected boolean nonParallel
protected SingleOutputStreamOperator(StreamExecutionEnvironment environment, Transformation<T> transformation)
public String getName()
public SingleOutputStreamOperator<T> name(String name)
@PublicEvolving public SingleOutputStreamOperator<T> uid(String uid)
The specified ID is used to assign the same operator ID across job submissions (for example when starting a job from a savepoint).
Important: this ID needs to be unique per transformation and job. Otherwise, job submission will fail.
uid
- The unique user-specified ID of this transformation.@PublicEvolving public SingleOutputStreamOperator<T> setUidHash(String uidHash)
The user provided hash is an alternative to the generated hashes, that is considered when identifying an operator through the default hash mechanics fails (e.g. because of changes between Flink versions).
Important: this should be used as a workaround or for trouble shooting. The provided hash needs to be unique per transformation and job. Otherwise, job submission will fail. Furthermore, you cannot assign user-specified hash to intermediate nodes in an operator chain and trying so will let your job fail.
A use case for this is in migration between Flink versions or changing the jobs in a way that changes the automatically generated hashes. In this case, providing the previous hashes directly through this method (e.g. obtained from old logs) can help to reestablish a lost mapping from states to their target operator.
uidHash
- The user provided hash for this operator. This will become the JobVertexID,
which is shown in the logs and web ui.public SingleOutputStreamOperator<T> setParallelism(int parallelism)
parallelism
- The parallelism for this operator.@PublicEvolving public SingleOutputStreamOperator<T> setMaxParallelism(int maxParallelism)
The maximum parallelism specifies the upper bound for dynamic scaling. It also defines the number of key groups used for partitioned state.
maxParallelism
- Maximum parallelism@PublicEvolving public SingleOutputStreamOperator<T> forceNonParallel()
public SingleOutputStreamOperator<T> setBufferTimeout(long timeoutMillis)
Lower timeouts lead to lower tail latencies, but may affect throughput. Timeouts of 1 ms still sustain high throughput, even for jobs with high parallelism.
A value of '-1' means that the default buffer timeout should be used. A value of '0' indicates that no buffering should happen, and all records/events should be immediately sent through the network, without additional buffering.
timeoutMillis
- The maximum time between two output flushes.@PublicEvolving public SingleOutputStreamOperator<T> disableChaining()
Chaining can be turned off for the whole job by StreamExecutionEnvironment.disableOperatorChaining()
however it is not advised for
performance considerations.
@PublicEvolving public SingleOutputStreamOperator<T> startNewChain()
public SingleOutputStreamOperator<T> returns(Class<T> typeClass)
Classes can be used as type hints for non-generic types (classes without generic
parameters), but not for generic types like for example Tuples. For those generic types,
please use the returns(TypeHint)
method.
typeClass
- The class of the returned data type.public SingleOutputStreamOperator<T> returns(TypeHint<T> typeHint)
Use this method the following way:
DataStream<Tuple2<String, Double>> result =
stream.flatMap(new FunctionWithNonInferrableReturnType())
.returns(new TypeHint<Tuple2<String, Double>>(){});
typeHint
- The type hint for the returned data type.public SingleOutputStreamOperator<T> returns(TypeInformation<T> typeInfo)
In most cases, the methods returns(Class)
and returns(TypeHint)
are
preferable.
typeInfo
- type information as a return type hint@PublicEvolving public SingleOutputStreamOperator<T> slotSharingGroup(String slotSharingGroup)
Operations inherit the slot sharing group of input operations if all input operations are in the same slot sharing group and no slot sharing group was explicitly specified.
Initially an operation is in the default slot sharing group. An operation can be put into
the default group explicitly by setting the slot sharing group to "default"
.
slotSharingGroup
- The slot sharing group name.@PublicEvolving public SingleOutputStreamOperator<T> slotSharingGroup(SlotSharingGroup slotSharingGroup)
Operations inherit the slot sharing group of input operations if all input operations are in the same slot sharing group and no slot sharing group was explicitly specified.
Initially an operation is in the default slot sharing group. An operation can be put into
the default group explicitly by setting the slot sharing group with name "default"
.
slotSharingGroup
- Which contains name and its resource spec.public <X> SideOutputDataStream<X> getSideOutput(OutputTag<X> sideOutputTag)
DataStream
that contains the elements that are emitted from an operation
into the side output with the given OutputTag
.@PublicEvolving public SingleOutputStreamOperator<T> setDescription(String description)
Description is used in json plan and web ui, but not in logging and metrics where only name is available. Description is expected to provide detailed information about the sink, while name is expected to be more simple, providing summary information only, so that we can have more user-friendly logging messages and metric tags without losing useful messages for debugging.
description
- The description for this operation.@PublicEvolving public CachedDataStream<T> cache()
CachedDataStream.invalidate()
called or the StreamExecutionEnvironment
close.Copyright © 2014–2024 The Apache Software Foundation. All rights reserved.