Class SingleOutputStreamOperator<T>
- java.lang.Object
-
- org.apache.flink.streaming.api.datastream.DataStream<T>
-
- org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator<T>
-
- Type Parameters:
T
- The type of the elements in this stream.
- Direct Known Subclasses:
DataStreamSource
@Public public class SingleOutputStreamOperator<T> extends DataStream<T>
SingleOutputStreamOperator
represents a user defined transformation applied on aDataStream
with one predefined output type.
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class org.apache.flink.streaming.api.datastream.DataStream
DataStream.Collector<T>
-
-
Field Summary
Fields Modifier and Type Field Description protected boolean
nonParallel
Indicate this is a non-parallel operator and cannot set a non-1 degree of parallelism-
Fields inherited from class org.apache.flink.streaming.api.datastream.DataStream
environment, transformation
-
-
Constructor Summary
Constructors Modifier Constructor Description protected
SingleOutputStreamOperator(StreamExecutionEnvironment environment, Transformation<T> transformation)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description CachedDataStream<T>
cache()
Cache the intermediate result of the transformation.SingleOutputStreamOperator<T>
disableChaining()
Turns off chaining for this operator so thread co-location will not be used as an optimization.SingleOutputStreamOperator<T>
forceNonParallel()
Sets the parallelism and maximum parallelism of this operator to one.String
getName()
Gets the name of the current data stream.<X> SideOutputDataStream<X>
getSideOutput(OutputTag<X> sideOutputTag)
Gets theDataStream
that contains the elements that are emitted from an operation into the side output with the givenOutputTag
.SingleOutputStreamOperator<T>
name(String name)
Sets the name of the current data stream.SingleOutputStreamOperator<T>
returns(Class<T> typeClass)
Adds a type information hint about the return type of this operator.SingleOutputStreamOperator<T>
returns(TypeHint<T> typeHint)
Adds a type information hint about the return type of this operator.SingleOutputStreamOperator<T>
returns(TypeInformation<T> typeInfo)
Adds a type information hint about the return type of this operator.SingleOutputStreamOperator<T>
setBufferTimeout(long timeoutMillis)
Sets the buffering timeout for data produced by this operation.SingleOutputStreamOperator<T>
setDescription(String description)
Sets the description for this operation.SingleOutputStreamOperator<T>
setMaxParallelism(int maxParallelism)
Sets the maximum parallelism of this operator.SingleOutputStreamOperator<T>
setParallelism(int parallelism)
Sets the parallelism for this operator.SingleOutputStreamOperator<T>
setUidHash(String uidHash)
Sets an user provided hash for this operator.SingleOutputStreamOperator<T>
slotSharingGroup(String slotSharingGroup)
Sets the slot sharing group of this operation.SingleOutputStreamOperator<T>
slotSharingGroup(SlotSharingGroup slotSharingGroup)
Sets the slot sharing group of this operation.SingleOutputStreamOperator<T>
startNewChain()
Starts a new task chain beginning at this operator.SingleOutputStreamOperator<T>
uid(String uid)
Sets an ID for this operator.-
Methods inherited from class org.apache.flink.streaming.api.datastream.DataStream
addSink, assignTimestampsAndWatermarks, broadcast, broadcast, clean, coGroup, collectAsync, collectAsync, connect, connect, countWindowAll, countWindowAll, doTransform, executeAndCollect, executeAndCollect, executeAndCollect, executeAndCollect, filter, flatMap, flatMap, forward, fullWindowPartition, getExecutionConfig, getExecutionEnvironment, getId, getMinResources, getParallelism, getPreferredResources, getTransformation, getType, global, join, keyBy, keyBy, keyBy, map, map, partitionCustom, print, print, printToErr, printToErr, process, process, project, rebalance, rescale, setConnectionType, shuffle, sinkTo, sinkTo, transform, transform, union, windowAll, writeToSocket, writeUsingOutputFormat
-
-
-
-
Constructor Detail
-
SingleOutputStreamOperator
protected SingleOutputStreamOperator(StreamExecutionEnvironment environment, Transformation<T> transformation)
-
-
Method Detail
-
getName
public String getName()
Gets the name of the current data stream. This name is used by the visualization and logging during runtime.- Returns:
- Name of the stream.
-
name
public SingleOutputStreamOperator<T> name(String name)
Sets the name of the current data stream. This name is used by the visualization and logging during runtime.- Returns:
- The named operator.
-
uid
@PublicEvolving public SingleOutputStreamOperator<T> uid(String uid)
Sets an ID for this operator.The specified ID is used to assign the same operator ID across job submissions (for example when starting a job from a savepoint).
Important: this ID needs to be unique per transformation and job. Otherwise, job submission will fail.
- Parameters:
uid
- The unique user-specified ID of this transformation.- Returns:
- The operator with the specified ID.
-
setUidHash
@PublicEvolving public SingleOutputStreamOperator<T> setUidHash(String uidHash)
Sets an user provided hash for this operator. This will be used AS IS the create the JobVertexID.The user provided hash is an alternative to the generated hashes, that is considered when identifying an operator through the default hash mechanics fails (e.g. because of changes between Flink versions).
Important: this should be used as a workaround or for trouble shooting. The provided hash needs to be unique per transformation and job. Otherwise, job submission will fail. Furthermore, you cannot assign user-specified hash to intermediate nodes in an operator chain and trying so will let your job fail.
A use case for this is in migration between Flink versions or changing the jobs in a way that changes the automatically generated hashes. In this case, providing the previous hashes directly through this method (e.g. obtained from old logs) can help to reestablish a lost mapping from states to their target operator.
- Parameters:
uidHash
- The user provided hash for this operator. This will become the JobVertexID, which is shown in the logs and web ui.- Returns:
- The operator with the user provided hash.
-
setParallelism
public SingleOutputStreamOperator<T> setParallelism(int parallelism)
Sets the parallelism for this operator.- Parameters:
parallelism
- The parallelism for this operator.- Returns:
- The operator with set parallelism.
-
setMaxParallelism
@PublicEvolving public SingleOutputStreamOperator<T> setMaxParallelism(int maxParallelism)
Sets the maximum parallelism of this operator.The maximum parallelism specifies the upper bound for dynamic scaling. It also defines the number of key groups used for partitioned state.
- Parameters:
maxParallelism
- Maximum parallelism- Returns:
- The operator with set maximum parallelism
-
forceNonParallel
@PublicEvolving public SingleOutputStreamOperator<T> forceNonParallel()
Sets the parallelism and maximum parallelism of this operator to one. And mark this operator cannot set a non-1 degree of parallelism.- Returns:
- The operator with only one parallelism.
-
setBufferTimeout
public SingleOutputStreamOperator<T> setBufferTimeout(long timeoutMillis)
Sets the buffering timeout for data produced by this operation. The timeout defines how long data may linger in a partially full buffer before being sent over the network.Lower timeouts lead to lower tail latencies, but may affect throughput. Timeouts of 1 ms still sustain high throughput, even for jobs with high parallelism.
A value of '-1' means that the default buffer timeout should be used. A value of '0' indicates that no buffering should happen, and all records/events should be immediately sent through the network, without additional buffering.
- Parameters:
timeoutMillis
- The maximum time between two output flushes.- Returns:
- The operator with buffer timeout set.
-
disableChaining
@PublicEvolving public SingleOutputStreamOperator<T> disableChaining()
Turns off chaining for this operator so thread co-location will not be used as an optimization.Chaining can be turned off for the whole job by
StreamExecutionEnvironment.disableOperatorChaining()
however it is not advised for performance considerations.- Returns:
- The operator with chaining disabled
-
startNewChain
@PublicEvolving public SingleOutputStreamOperator<T> startNewChain()
Starts a new task chain beginning at this operator. This operator will not be chained (thread co-located for increased performance) to any previous tasks even if possible.- Returns:
- The operator with chaining set.
-
returns
public SingleOutputStreamOperator<T> returns(Class<T> typeClass)
Adds a type information hint about the return type of this operator. This method can be used in cases where Flink cannot determine automatically what the produced type of a function is. That can be the case if the function uses generic type variables in the return type that cannot be inferred from the input type.Classes can be used as type hints for non-generic types (classes without generic parameters), but not for generic types like for example Tuples. For those generic types, please use the
returns(TypeHint)
method.- Parameters:
typeClass
- The class of the returned data type.- Returns:
- This operator with the type information corresponding to the given type class.
-
returns
public SingleOutputStreamOperator<T> returns(TypeHint<T> typeHint)
Adds a type information hint about the return type of this operator. This method can be used in cases where Flink cannot determine automatically what the produced type of a function is. That can be the case if the function uses generic type variables in the return type that cannot be inferred from the input type.Use this method the following way:
DataStream<Tuple2<String, Double>> result = stream.flatMap(new FunctionWithNonInferrableReturnType()) .returns(new TypeHint<Tuple2<String, Double>>(){});
- Parameters:
typeHint
- The type hint for the returned data type.- Returns:
- This operator with the type information corresponding to the given type hint.
-
returns
public SingleOutputStreamOperator<T> returns(TypeInformation<T> typeInfo)
Adds a type information hint about the return type of this operator. This method can be used in cases where Flink cannot determine automatically what the produced type of a function is. That can be the case if the function uses generic type variables in the return type that cannot be inferred from the input type.In most cases, the methods
returns(Class)
andreturns(TypeHint)
are preferable.- Parameters:
typeInfo
- type information as a return type hint- Returns:
- This operator with a given return type hint.
-
slotSharingGroup
@PublicEvolving public SingleOutputStreamOperator<T> slotSharingGroup(String slotSharingGroup)
Sets the slot sharing group of this operation. Parallel instances of operations that are in the same slot sharing group will be co-located in the same TaskManager slot, if possible.Operations inherit the slot sharing group of input operations if all input operations are in the same slot sharing group and no slot sharing group was explicitly specified.
Initially an operation is in the default slot sharing group. An operation can be put into the default group explicitly by setting the slot sharing group to
"default"
.- Parameters:
slotSharingGroup
- The slot sharing group name.
-
slotSharingGroup
@PublicEvolving public SingleOutputStreamOperator<T> slotSharingGroup(SlotSharingGroup slotSharingGroup)
Sets the slot sharing group of this operation. Parallel instances of operations that are in the same slot sharing group will be co-located in the same TaskManager slot, if possible.Operations inherit the slot sharing group of input operations if all input operations are in the same slot sharing group and no slot sharing group was explicitly specified.
Initially an operation is in the default slot sharing group. An operation can be put into the default group explicitly by setting the slot sharing group with name
"default"
.- Parameters:
slotSharingGroup
- Which contains name and its resource spec.
-
getSideOutput
public <X> SideOutputDataStream<X> getSideOutput(OutputTag<X> sideOutputTag)
Gets theDataStream
that contains the elements that are emitted from an operation into the side output with the givenOutputTag
.
-
setDescription
@PublicEvolving public SingleOutputStreamOperator<T> setDescription(String description)
Sets the description for this operation.Description is used in json plan and web ui, but not in logging and metrics where only name is available. Description is expected to provide detailed information about the sink, while name is expected to be more simple, providing summary information only, so that we can have more user-friendly logging messages and metric tags without losing useful messages for debugging.
- Parameters:
description
- The description for this operation.- Returns:
- The operation with new description.
-
cache
@PublicEvolving public CachedDataStream<T> cache()
Cache the intermediate result of the transformation. Only support bounded streams and currently only block mode is supported. The cache is generated lazily at the first time the intermediate result is computed. The cache will be clear whenCachedDataStream.invalidate()
called or theStreamExecutionEnvironment
close.- Returns:
- CachedDataStream that can use in later job to reuse the cached intermediate result.
-
-