T- The type of the record emitted by this source reader.
SplitT- The type of the source splits.
@Public public interface SourceReader<T,SplitT extends SourceSplit> extends AutoCloseable, CheckpointListener
Implementations can provide the following metrics:
|Modifier and Type||Method and Description|
Adds a list of splits for this reader to read.
Handle a custom source event sent by the
Returns a future that signals that data is available from the reader.
We have an empty default implementation here because most source readers do not have to implement the method.
This method is called when the reader is notified that it will not receive any further splits.
Poll the next available record into the
Checkpoint on the state of the source.
Start the reader.
InputStatus pollNext(ReaderOutput<T> output) throws Exception
The implementation must make sure this method is non-blocking.
Although the implementation can emit multiple records into the given SourceOutput, it is
recommended not doing so. Instead, emit one record into the SourceOutput and return a
InputStatus.MORE_AVAILABLE to let the caller thread know there are more records available.
List<SplitT> snapshotState(long checkpointId)
Once the future completes, the runtime will keep calling the
pollNext(ReaderOutput) method until that methods returns a status other than
InputStatus.MORE_AVAILABLE. After that the, the runtime will again call this method to
obtain the next future. Once that completes, it will again call
pollNext(ReaderOutput) and so on.
The contract is the following: If the reader has data available, then all futures previously returned by this method must eventually complete. Otherwise the source might stall indefinitely.
It is not a problem to have occasional "false positives", meaning to complete a future
even if no data is available. However, one should not use an "always complete" future in
cases no data is available, because that will result in busy waiting loops calling
pollNext(...) even though no data is available.
void addSplits(List<SplitT> splits)
splits- The splits assigned by the split enumerator.
It is triggered when the enumerator calls
SplitEnumeratorContext.signalNoMoreSplits(int) with the reader's parallel subtask.
default void handleSourceEvents(SourceEvent sourceEvent)
SplitEnumerator. This method is called when the enumerator sends an event via
This method has a default implementation that does nothing, because most sources do not require any custom events.
sourceEvent- the event sent by the
default void notifyCheckpointComplete(long checkpointId) throws Exception
checkpointId- The ID of the checkpoint that has been completed.
Exception- This method can propagate exceptions, which leads to a failure/recovery for the task. Note that this will NOT lead to the checkpoint being revoked.
Copyright © 2014–2023 The Apache Software Foundation. All rights reserved.