Class SingleOutputStreamOperator<T>

  • Type Parameters:
    T - The type of the elements in this stream.
    Direct Known Subclasses:
    DataStreamSource

    @Public
    public class SingleOutputStreamOperator<T>
    extends DataStream<T>
    SingleOutputStreamOperator represents a user defined transformation applied on a DataStream with one predefined output type.
    • Field Detail

      • nonParallel

        protected boolean nonParallel
        Indicate this is a non-parallel operator and cannot set a non-1 degree of parallelism. *
    • Method Detail

      • getName

        public String getName()
        Gets the name of the current data stream. This name is used by the visualization and logging during runtime.
        Returns:
        Name of the stream.
      • name

        public SingleOutputStreamOperator<T> name​(String name)
        Sets the name of the current data stream. This name is used by the visualization and logging during runtime.
        Returns:
        The named operator.
      • uid

        @PublicEvolving
        public SingleOutputStreamOperator<T> uid​(String uid)
        Sets an ID for this operator.

        The specified ID is used to assign the same operator ID across job submissions (for example when starting a job from a savepoint).

        Important: this ID needs to be unique per transformation and job. Otherwise, job submission will fail.

        Parameters:
        uid - The unique user-specified ID of this transformation.
        Returns:
        The operator with the specified ID.
      • setUidHash

        @PublicEvolving
        public SingleOutputStreamOperator<T> setUidHash​(String uidHash)
        Sets an user provided hash for this operator. This will be used AS IS the create the JobVertexID.

        The user provided hash is an alternative to the generated hashes, that is considered when identifying an operator through the default hash mechanics fails (e.g. because of changes between Flink versions).

        Important: this should be used as a workaround or for trouble shooting. The provided hash needs to be unique per transformation and job. Otherwise, job submission will fail. Furthermore, you cannot assign user-specified hash to intermediate nodes in an operator chain and trying so will let your job fail.

        A use case for this is in migration between Flink versions or changing the jobs in a way that changes the automatically generated hashes. In this case, providing the previous hashes directly through this method (e.g. obtained from old logs) can help to reestablish a lost mapping from states to their target operator.

        Parameters:
        uidHash - The user provided hash for this operator. This will become the JobVertexID, which is shown in the logs and web ui.
        Returns:
        The operator with the user provided hash.
      • setParallelism

        public SingleOutputStreamOperator<T> setParallelism​(int parallelism)
        Sets the parallelism for this operator.
        Parameters:
        parallelism - The parallelism for this operator.
        Returns:
        The operator with set parallelism.
      • setMaxParallelism

        @PublicEvolving
        public SingleOutputStreamOperator<T> setMaxParallelism​(int maxParallelism)
        Sets the maximum parallelism of this operator.

        The maximum parallelism specifies the upper bound for dynamic scaling. It also defines the number of key groups used for partitioned state.

        Parameters:
        maxParallelism - Maximum parallelism
        Returns:
        The operator with set maximum parallelism
      • forceNonParallel

        @PublicEvolving
        public SingleOutputStreamOperator<T> forceNonParallel()
        Sets the parallelism and maximum parallelism of this operator to one. And mark this operator cannot set a non-1 degree of parallelism.
        Returns:
        The operator with only one parallelism.
      • setBufferTimeout

        public SingleOutputStreamOperator<T> setBufferTimeout​(long timeoutMillis)
        Sets the buffering timeout for data produced by this operation. The timeout defines how long data may linger in a partially full buffer before being sent over the network.

        Lower timeouts lead to lower tail latencies, but may affect throughput. Timeouts of 1 ms still sustain high throughput, even for jobs with high parallelism.

        A value of '-1' means that the default buffer timeout should be used. A value of '0' indicates that no buffering should happen, and all records/events should be immediately sent through the network, without additional buffering.

        Parameters:
        timeoutMillis - The maximum time between two output flushes.
        Returns:
        The operator with buffer timeout set.
      • startNewChain

        @PublicEvolving
        public SingleOutputStreamOperator<T> startNewChain()
        Starts a new task chain beginning at this operator. This operator will not be chained (thread co-located for increased performance) to any previous tasks even if possible.
        Returns:
        The operator with chaining set.
      • returns

        public SingleOutputStreamOperator<T> returns​(Class<T> typeClass)
        Adds a type information hint about the return type of this operator. This method can be used in cases where Flink cannot determine automatically what the produced type of a function is. That can be the case if the function uses generic type variables in the return type that cannot be inferred from the input type.

        Classes can be used as type hints for non-generic types (classes without generic parameters), but not for generic types like for example Tuples. For those generic types, please use the returns(TypeHint) method.

        Parameters:
        typeClass - The class of the returned data type.
        Returns:
        This operator with the type information corresponding to the given type class.
      • returns

        public SingleOutputStreamOperator<T> returns​(TypeHint<T> typeHint)
        Adds a type information hint about the return type of this operator. This method can be used in cases where Flink cannot determine automatically what the produced type of a function is. That can be the case if the function uses generic type variables in the return type that cannot be inferred from the input type.

        Use this method the following way:

        
         DataStream<Tuple2<String, Double>> result =
             stream.flatMap(new FunctionWithNonInferrableReturnType())
                   .returns(new TypeHint<Tuple2<String, Double>>(){});
         
        Parameters:
        typeHint - The type hint for the returned data type.
        Returns:
        This operator with the type information corresponding to the given type hint.
      • returns

        public SingleOutputStreamOperator<T> returns​(TypeInformation<T> typeInfo)
        Adds a type information hint about the return type of this operator. This method can be used in cases where Flink cannot determine automatically what the produced type of a function is. That can be the case if the function uses generic type variables in the return type that cannot be inferred from the input type.

        In most cases, the methods returns(Class) and returns(TypeHint) are preferable.

        Parameters:
        typeInfo - type information as a return type hint
        Returns:
        This operator with a given return type hint.
      • slotSharingGroup

        @PublicEvolving
        public SingleOutputStreamOperator<T> slotSharingGroup​(String slotSharingGroup)
        Sets the slot sharing group of this operation. Parallel instances of operations that are in the same slot sharing group will be co-located in the same TaskManager slot, if possible.

        Operations inherit the slot sharing group of input operations if all input operations are in the same slot sharing group and no slot sharing group was explicitly specified.

        Initially an operation is in the default slot sharing group. An operation can be put into the default group explicitly by setting the slot sharing group to "default".

        Parameters:
        slotSharingGroup - The slot sharing group name.
      • slotSharingGroup

        @PublicEvolving
        public SingleOutputStreamOperator<T> slotSharingGroup​(SlotSharingGroup slotSharingGroup)
        Sets the slot sharing group of this operation. Parallel instances of operations that are in the same slot sharing group will be co-located in the same TaskManager slot, if possible.

        Operations inherit the slot sharing group of input operations if all input operations are in the same slot sharing group and no slot sharing group was explicitly specified.

        Initially an operation is in the default slot sharing group. An operation can be put into the default group explicitly by setting the slot sharing group with name "default".

        Parameters:
        slotSharingGroup - Which contains name and its resource spec.
      • setDescription

        @PublicEvolving
        public SingleOutputStreamOperator<T> setDescription​(String description)
        Sets the description for this operation.

        Description is used in json plan and web ui, but not in logging and metrics where only name is available. Description is expected to provide detailed information about the sink, while name is expected to be more simple, providing summary information only, so that we can have more user-friendly logging messages and metric tags without losing useful messages for debugging.

        Parameters:
        description - The description for this operation.
        Returns:
        The operation with new description.
      • cache

        @PublicEvolving
        public CachedDataStream<T> cache()
        Cache the intermediate result of the transformation. Only support bounded streams and currently only block mode is supported. The cache is generated lazily at the first time the intermediate result is computed. The cache will be clear when CachedDataStream.invalidate() called or the StreamExecutionEnvironment close.
        Returns:
        CachedDataStream that can use in later job to reuse the cached intermediate result.