Class SourceOutputWithWatermarks<T>

  • Type Parameters:
    T - The type of emitted records.
    All Implemented Interfaces:
    WatermarkOutput, SourceOutput<T>

    @Internal
    public class SourceOutputWithWatermarks<T>
    extends Object
    implements SourceOutput<T>
    Implementation of the SourceOutput. The records emitted to this output are pushed into a given PushingAsyncDataInput.DataOutput. The watermarks are pushed into the same output, or into a separate WatermarkOutput, if one is provided.

    Periodic Watermarks

    This output does not implement automatic periodic watermark emission. The method emitPeriodicWatermark() needs to be called periodically.

    Note on Performance Considerations

    The methods SourceOutput.collect(Object) and SourceOutput.collect(Object, long) are highly performance-critical (part of the hot loop). To make the code as JIT friendly as possible, we want to have only a single implementation of these two methods, across all classes. That way, the JIT compiler can de-virtualize (and inline) them better.

    Currently, we have one implementation of these methods for the case where we don't need watermarks (see class NoOpTimestampsAndWatermarks) and one for the case where we do (this class). When the JVM is dedicated to a single job (or type of job) only one of these classes will be loaded. In mixed job setups, we still have a bimorphic method (rather than a poly/-/mega-morphic method).

    • Method Detail

      • collect

        public final void collect​(T record)
        Description copied from interface: SourceOutput
        Emit a record without a timestamp.

        Use this method if the source system does not have a notion of records with timestamps.

        The events later pass through a TimestampAssigner, which attaches a timestamp to the event based on the event's contents. For example a file source with JSON records would not have a generic timestamp from the file reading and JSON parsing process, and thus use this method to produce initially a record without a timestamp. The TimestampAssigner in the next step would be used to extract timestamp from a field of the JSON object.

        Specified by:
        collect in interface SourceOutput<T>
        Parameters:
        record - the record to emit.
      • collect

        public final void collect​(T record,
                                  long timestamp)
        Description copied from interface: SourceOutput
        Emit a record with a timestamp.

        Use this method if the source system has timestamps attached to records. Typical examples would be Logs, PubSubs, or Message Queues, like Kafka or Kinesis, which store a timestamp with each event.

        The events typically still pass through a TimestampAssigner, which may decide to either use this source-provided timestamp, or replace it with a timestamp stored within the event (for example if the event was a JSON object one could configure aTimestampAssigner that extracts one of the object's fields and uses that as a timestamp).

        Specified by:
        collect in interface SourceOutput<T>
        Parameters:
        record - the record to emit.
        timestamp - the timestamp of the record.
      • emitWatermark

        public final void emitWatermark​(Watermark watermark)
        Description copied from interface: WatermarkOutput
        Emits the given watermark.

        Emitting a watermark also implicitly marks the stream as active, ending previously marked idleness.

        Specified by:
        emitWatermark in interface WatermarkOutput
      • markIdle

        public final void markIdle()
        Description copied from interface: WatermarkOutput
        Marks this output as idle, meaning that downstream operations do not wait for watermarks from this output.

        An output becomes active again as soon as the next watermark is emitted or WatermarkOutput.markActive() is explicitly called.

        Specified by:
        markIdle in interface WatermarkOutput
      • markActive

        public void markActive()
        Description copied from interface: WatermarkOutput
        Marks this output as active, meaning that downstream operations should wait for watermarks from this output.
        Specified by:
        markActive in interface WatermarkOutput
      • emitPeriodicWatermark

        public final void emitPeriodicWatermark()