Class SingleThreadMultiplexSourceReaderBase<E,​T,​SplitT extends SourceSplit,​SplitStateT>

  • Type Parameters:
    E - The type of the records (the raw type that typically contains checkpointing information).
    T - The final type of the records emitted by the source.
    SplitT - The type of the splits processed by the source.
    SplitStateT - The type of the mutable state per split.
    All Implemented Interfaces:
    AutoCloseable, CheckpointListener, SourceReader<T,​SplitT>
    Direct Known Subclasses:
    FileSourceReader

    @PublicEvolving
    public abstract class SingleThreadMultiplexSourceReaderBase<E,​T,​SplitT extends SourceSplit,​SplitStateT>
    extends SourceReaderBase<E,​T,​SplitT,​SplitStateT>
    A base for SourceReaders that read splits with one thread using one SplitReader. The splits can be read either one after the other (like in a file source) or concurrently by changing the subscription in the split reader (like in the Kafka Source).

    To implement a source reader based on this class, implementors need to supply the following:

    • A SplitReader, which connects to the source and reads/polls data. The split reader gets notified whenever there is a new split. The split reader would read files, contain a Kafka or other source client, etc.
    • A RecordEmitter that takes a record from the Split Reader and updates the checkpointing state and converts it into the final form. For example for Kafka, the Record Emitter takes a ConsumerRecord, puts the offset information into state, transforms the records with the deserializers into the final type, and emits the record.
    • The class must override the methods to convert back and forth between the immutable splits (SplitT) and the mutable split state representation (SplitStateT).
    • Finally, the reader must decide what to do when it starts (SourceReaderBase.start()) or when a split is finished (SourceReaderBase.onSplitFinished(java.util.Map)).