T
- The type of the record emitted by this source reader.SplitT
- The type of the source splits.@Public public interface SourceReader<T,SplitT extends SourceSplit> extends AutoCloseable, CheckpointListener
SplitEnumerator
.
For most non-trivial source reader, it is recommended to use SourceReaderBase
which provides
an efficient hand-over protocol to avoid blocking I/O inside the task thread and supports various
split-threading models.
Implementations can provide the following metrics:
OperatorIOMetricGroup.getNumRecordsInCounter()
(highly recommended)
OperatorIOMetricGroup.getNumBytesInCounter()
(recommended)
SourceReaderMetricGroup.getNumRecordsInErrorsCounter()
(recommended)
SourceReaderMetricGroup.setPendingRecordsGauge(Gauge)
SourceReaderMetricGroup.setPendingBytesGauge(Gauge)
Modifier and Type | Method and Description |
---|---|
void |
addSplits(List<SplitT> splits)
Adds a list of splits for this reader to read.
|
default void |
handleSourceEvents(SourceEvent sourceEvent)
Handle a custom source event sent by the
SplitEnumerator . |
CompletableFuture<Void> |
isAvailable()
Returns a future that signals that data is available from the reader.
|
default void |
notifyCheckpointComplete(long checkpointId)
We have an empty default implementation here because most source readers do not have to
implement the method.
|
void |
notifyNoMoreSplits()
This method is called when the reader is notified that it will not receive any further
splits.
|
default void |
pauseOrResumeSplits(Collection<String> splitsToPause,
Collection<String> splitsToResume)
Pauses or resumes reading of individual source splits.
|
InputStatus |
pollNext(ReaderOutput<T> output)
Poll the next available record into the
ReaderOutput . |
List<SplitT> |
snapshotState(long checkpointId)
Checkpoint on the state of the source.
|
void |
start()
Start the reader.
|
close
notifyCheckpointAborted
void start()
InputStatus pollNext(ReaderOutput<T> output) throws Exception
ReaderOutput
.
The implementation must make sure this method is non-blocking.
Although the implementation can emit multiple records into the given ReaderOutput, it is
recommended not doing so. Instead, emit one record into the ReaderOutput and return a InputStatus.MORE_AVAILABLE
to let the caller thread know there are more records available.
Exception
List<SplitT> snapshotState(long checkpointId)
CompletableFuture<Void> isAvailable()
Once the future completes, the runtime will keep calling the pollNext(ReaderOutput)
method until that method returns a status other than InputStatus.MORE_AVAILABLE
. After that, the runtime will again call this method to obtain
the next future. Once that completes, it will again call pollNext(ReaderOutput)
and
so on.
The contract is the following: If the reader has data available, then all futures previously returned by this method must eventually complete. Otherwise the source might stall indefinitely.
It is not a problem to have occasional "false positives", meaning to complete a future
even if no data is available. However, one should not use an "always complete" future in
cases no data is available, because that will result in busy waiting loops calling pollNext(...)
even though no data is available.
void addSplits(List<SplitT> splits)
SplitEnumeratorContext.assignSplit(SourceSplit, int)
or SplitEnumeratorContext.assignSplits(SplitsAssignment)
.splits
- The splits assigned by the split enumerator.void notifyNoMoreSplits()
It is triggered when the enumerator calls SplitEnumeratorContext.signalNoMoreSplits(int)
with the reader's parallel subtask.
default void handleSourceEvents(SourceEvent sourceEvent)
SplitEnumerator
. This method is called when
the enumerator sends an event via SplitEnumeratorContext.sendEventToSourceReader(int,
SourceEvent)
.
This method has a default implementation that does nothing, because most sources do not require any custom events.
sourceEvent
- the event sent by the SplitEnumerator
.default void notifyCheckpointComplete(long checkpointId) throws Exception
notifyCheckpointComplete
in interface CheckpointListener
checkpointId
- The ID of the checkpoint that has been completed.Exception
- This method can propagate exceptions, which leads to a failure/recovery for
the task. Note that this will NOT lead to the checkpoint being revoked.CheckpointListener.notifyCheckpointComplete(long)
@PublicEvolving default void pauseOrResumeSplits(Collection<String> splitsToPause, Collection<String> splitsToResume)
Note that no other methods can be called in parallel, so updating subscriptions can be done atomically. This method is simply providing connectors with more expressive APIs the opportunity to update all subscriptions at once.
This is currently used to align the watermarks of splits, if watermark alignment is used and the source reads from more than one split.
The default implementation throws an UnsupportedOperationException
where the
default implementation will be removed in future releases. To be compatible with future
releases, it is recommended to implement this method and override the default implementation.
splitsToPause
- the splits to pausesplitsToResume
- the splits to resumeCopyright © 2014–2024 The Apache Software Foundation. All rights reserved.