Interface SourceReader<T,SplitT extends SourceSplit>
-
- Type Parameters:
T
- The type of the record emitted by this source reader.SplitT
- The type of the source splits.
- All Superinterfaces:
AutoCloseable
,CheckpointListener
- All Known Subinterfaces:
ExternallyInducedSourceReader<T,SplitT>
- All Known Implementing Classes:
DoubleEmittingSourceReaderWithCheckpointsInBetween
,FileSourceReader
,FromElementsSourceReader
,GeneratingIteratorSourceReader
,HybridSourceReader
,IteratorSourceReader
,IteratorSourceReaderBase
,RateLimitedSourceReader
,SingleThreadMultiplexSourceReaderBase
,SourceReaderBase
@Public public interface SourceReader<T,SplitT extends SourceSplit> extends AutoCloseable, CheckpointListener
The interface for a source reader which is responsible for reading the records from the source splits assigned bySplitEnumerator
.For most non-trivial source reader, it is recommended to use
SourceReaderBase
which provides an efficient hand-over protocol to avoid blocking I/O inside the task thread and supports various split-threading models.Implementations can provide the following metrics:
OperatorIOMetricGroup.getNumRecordsInCounter()
(highly recommended)OperatorIOMetricGroup.getNumBytesInCounter()
(recommended)SourceReaderMetricGroup.getNumRecordsInErrorsCounter()
(recommended)SourceReaderMetricGroup.setPendingRecordsGauge(Gauge)
SourceReaderMetricGroup.setPendingBytesGauge(Gauge)
-
-
Method Summary
All Methods Instance Methods Abstract Methods Default Methods Modifier and Type Method Description void
addSplits(List<SplitT> splits)
Adds a list of splits for this reader to read.default void
handleSourceEvents(SourceEvent sourceEvent)
Handle a custom source event sent by theSplitEnumerator
.CompletableFuture<Void>
isAvailable()
Returns a future that signals that data is available from the reader.default void
notifyCheckpointComplete(long checkpointId)
We have an empty default implementation here because most source readers do not have to implement the method.void
notifyNoMoreSplits()
This method is called when the reader is notified that it will not receive any further splits.default void
pauseOrResumeSplits(Collection<String> splitsToPause, Collection<String> splitsToResume)
Pauses or resumes reading of individual source splits.InputStatus
pollNext(ReaderOutput<T> output)
Poll the next available record into theReaderOutput
.List<SplitT>
snapshotState(long checkpointId)
Checkpoint on the state of the source.void
start()
Start the reader.-
Methods inherited from interface java.lang.AutoCloseable
close
-
Methods inherited from interface org.apache.flink.api.common.state.CheckpointListener
notifyCheckpointAborted
-
-
-
-
Method Detail
-
start
void start()
Start the reader.
-
pollNext
InputStatus pollNext(ReaderOutput<T> output) throws Exception
Poll the next available record into theReaderOutput
.The implementation must make sure this method is non-blocking.
Although the implementation can emit multiple records into the given ReaderOutput, it is recommended not doing so. Instead, emit one record into the ReaderOutput and return a
InputStatus.MORE_AVAILABLE
to let the caller thread know there are more records available.- Returns:
- The InputStatus of the SourceReader after the method invocation.
- Throws:
Exception
-
snapshotState
List<SplitT> snapshotState(long checkpointId)
Checkpoint on the state of the source.- Returns:
- the state of the source.
-
isAvailable
CompletableFuture<Void> isAvailable()
Returns a future that signals that data is available from the reader.Once the future completes, the runtime will keep calling the
pollNext(ReaderOutput)
method until that method returns a status other thanInputStatus.MORE_AVAILABLE
. After that, the runtime will again call this method to obtain the next future. Once that completes, it will again callpollNext(ReaderOutput)
and so on.The contract is the following: If the reader has data available, then all futures previously returned by this method must eventually complete. Otherwise the source might stall indefinitely.
It is not a problem to have occasional "false positives", meaning to complete a future even if no data is available. However, one should not use an "always complete" future in cases no data is available, because that will result in busy waiting loops calling
pollNext(...)
even though no data is available.- Returns:
- a future that will be completed once there is a record available to poll.
-
addSplits
void addSplits(List<SplitT> splits)
Adds a list of splits for this reader to read. This method is called when the enumerator assigns a split viaSplitEnumeratorContext.assignSplit(SourceSplit, int)
orSplitEnumeratorContext.assignSplits(SplitsAssignment)
.- Parameters:
splits
- The splits assigned by the split enumerator.
-
notifyNoMoreSplits
void notifyNoMoreSplits()
This method is called when the reader is notified that it will not receive any further splits.It is triggered when the enumerator calls
SplitEnumeratorContext.signalNoMoreSplits(int)
with the reader's parallel subtask.
-
handleSourceEvents
default void handleSourceEvents(SourceEvent sourceEvent)
Handle a custom source event sent by theSplitEnumerator
. This method is called when the enumerator sends an event viaSplitEnumeratorContext.sendEventToSourceReader(int, SourceEvent)
.This method has a default implementation that does nothing, because most sources do not require any custom events.
- Parameters:
sourceEvent
- the event sent by theSplitEnumerator
.
-
notifyCheckpointComplete
default void notifyCheckpointComplete(long checkpointId) throws Exception
We have an empty default implementation here because most source readers do not have to implement the method.- Specified by:
notifyCheckpointComplete
in interfaceCheckpointListener
- Parameters:
checkpointId
- The ID of the checkpoint that has been completed.- Throws:
Exception
- This method can propagate exceptions, which leads to a failure/recovery for the task. Note that this will NOT lead to the checkpoint being revoked.- See Also:
CheckpointListener.notifyCheckpointComplete(long)
-
pauseOrResumeSplits
@PublicEvolving default void pauseOrResumeSplits(Collection<String> splitsToPause, Collection<String> splitsToResume)
Pauses or resumes reading of individual source splits.Note that no other methods can be called in parallel, so updating subscriptions can be done atomically. This method is simply providing connectors with more expressive APIs the opportunity to update all subscriptions at once.
This is currently used to align the watermarks of splits, if watermark alignment is used and the source reads from more than one split.
The default implementation throws an
UnsupportedOperationException
where the default implementation will be removed in future releases. To be compatible with future releases, it is recommended to implement this method and override the default implementation.- Parameters:
splitsToPause
- the splits to pausesplitsToResume
- the splits to resume
-
-