Class HybridSourceReader<T>

  • All Implemented Interfaces:
    AutoCloseable, CheckpointListener, SourceReader<T,​HybridSourceSplit>

    public class HybridSourceReader<T>
    extends Object
    implements SourceReader<T,​HybridSourceSplit>
    Hybrid source reader that delegates to the actual source reader.

    This reader processes splits from a sequence of sources as determined by the enumerator. The current source is provided with SwitchSourceEvent and the reader does not require upfront knowledge of the number and order of sources. At a given point in time one underlying reader is active.

    When the underlying reader has consumed all input for a source, HybridSourceReader sends SourceReaderFinishedEvent to the coordinator.

    This reader does not make assumptions about the order in which sources are activated. When recovering from a checkpoint it may start processing splits for a previous source, which is indicated via SwitchSourceEvent.

    • Constructor Detail

    • Method Detail

      • pollNext

        public InputStatus pollNext​(ReaderOutput output)
                             throws Exception
        Description copied from interface: SourceReader
        Poll the next available record into the ReaderOutput.

        The implementation must make sure this method is non-blocking.

        Although the implementation can emit multiple records into the given ReaderOutput, it is recommended not doing so. Instead, emit one record into the ReaderOutput and return a InputStatus.MORE_AVAILABLE to let the caller thread know there are more records available.

        Specified by:
        pollNext in interface SourceReader<T,​HybridSourceSplit>
        Returns:
        The InputStatus of the SourceReader after the method invocation.
        Throws:
        Exception
      • notifyCheckpointAborted

        public void notifyCheckpointAborted​(long checkpointId)
                                     throws Exception
        Description copied from interface: CheckpointListener
        This method is called as a notification once a distributed checkpoint has been aborted.

        Important: The fact that a checkpoint has been aborted does NOT mean that the data and artifacts produced between the previous checkpoint and the aborted checkpoint are to be discarded. The expected behavior is as if this checkpoint was never triggered in the first place, and the next successful checkpoint simply covers a longer time span. See the "Checkpoint Subsuming Contract" in the class-level JavaDocs for details.

        These notifications are "best effort", meaning they can sometimes be skipped.

        This method is very rarely necessary to implement. The "best effort" guarantee, together with the fact that this method should not result in discarding any data (per the "Checkpoint Subsuming Contract") means it is mainly useful for earlier cleanups of auxiliary resources. One example is to pro-actively clear a local per-checkpoint state cache upon checkpoint failure.

        Specified by:
        notifyCheckpointAborted in interface CheckpointListener
        Parameters:
        checkpointId - The ID of the checkpoint that has been aborted.
        Throws:
        Exception - This method can propagate exceptions, which leads to a failure/recovery for the task or job.
      • isAvailable

        public CompletableFuture<Void> isAvailable()
        Description copied from interface: SourceReader
        Returns a future that signals that data is available from the reader.

        Once the future completes, the runtime will keep calling the SourceReader.pollNext(ReaderOutput) method until that method returns a status other than InputStatus.MORE_AVAILABLE. After that, the runtime will again call this method to obtain the next future. Once that completes, it will again call SourceReader.pollNext(ReaderOutput) and so on.

        The contract is the following: If the reader has data available, then all futures previously returned by this method must eventually complete. Otherwise the source might stall indefinitely.

        It is not a problem to have occasional "false positives", meaning to complete a future even if no data is available. However, one should not use an "always complete" future in cases no data is available, because that will result in busy waiting loops calling pollNext(...) even though no data is available.

        Specified by:
        isAvailable in interface SourceReader<T,​HybridSourceSplit>
        Returns:
        a future that will be completed once there is a record available to poll.