public class HybridSourceReader<T> extends Object implements SourceReader<T,HybridSourceSplit>
This reader processes splits from a sequence of sources as determined by the enumerator. The
current source is provided with SwitchSourceEvent
and the reader does not require upfront
knowledge of the number and order of sources. At a given point in time one underlying reader is
active.
When the underlying reader has consumed all input for a source, HybridSourceReader
sends SourceReaderFinishedEvent
to the coordinator.
This reader does not make assumptions about the order in which sources are activated. When
recovering from a checkpoint it may start processing splits for a previous source, which is
indicated via SwitchSourceEvent
.
Constructor and Description |
---|
HybridSourceReader(SourceReaderContext readerContext) |
Modifier and Type | Method and Description |
---|---|
void |
addSplits(List<HybridSourceSplit> splits)
Adds a list of splits for this reader to read.
|
void |
close() |
void |
handleSourceEvents(SourceEvent sourceEvent)
Handle a custom source event sent by the
SplitEnumerator . |
CompletableFuture<Void> |
isAvailable()
Returns a future that signals that data is available from the reader.
|
void |
notifyCheckpointAborted(long checkpointId)
This method is called as a notification once a distributed checkpoint has been aborted.
|
void |
notifyCheckpointComplete(long checkpointId)
We have an empty default implementation here because most source readers do not have to
implement the method.
|
void |
notifyNoMoreSplits()
This method is called when the reader is notified that it will not receive any further
splits.
|
InputStatus |
pollNext(ReaderOutput output)
Poll the next available record into the
SourceOutput . |
List<HybridSourceSplit> |
snapshotState(long checkpointId)
Checkpoint on the state of the source.
|
void |
start()
Start the reader.
|
public HybridSourceReader(SourceReaderContext readerContext)
public void start()
SourceReader
start
in interface SourceReader<T,HybridSourceSplit>
public InputStatus pollNext(ReaderOutput output) throws Exception
SourceReader
SourceOutput
.
The implementation must make sure this method is non-blocking.
Although the implementation can emit multiple records into the given SourceOutput, it is
recommended not doing so. Instead, emit one record into the SourceOutput and return a InputStatus.MORE_AVAILABLE
to let the caller thread know there are more records available.
pollNext
in interface SourceReader<T,HybridSourceSplit>
Exception
public List<HybridSourceSplit> snapshotState(long checkpointId)
SourceReader
snapshotState
in interface SourceReader<T,HybridSourceSplit>
public void notifyCheckpointComplete(long checkpointId) throws Exception
SourceReader
notifyCheckpointComplete
in interface CheckpointListener
notifyCheckpointComplete
in interface SourceReader<T,HybridSourceSplit>
checkpointId
- The ID of the checkpoint that has been completed.Exception
- This method can propagate exceptions, which leads to a failure/recovery for
the task. Note that this will NOT lead to the checkpoint being revoked.CheckpointListener.notifyCheckpointComplete(long)
public void notifyCheckpointAborted(long checkpointId) throws Exception
CheckpointListener
Important: The fact that a checkpoint has been aborted does NOT mean that the data
and artifacts produced between the previous checkpoint and the aborted checkpoint are to be
discarded. The expected behavior is as if this checkpoint was never triggered in the first
place, and the next successful checkpoint simply covers a longer time span. See the
"Checkpoint Subsuming Contract" in the class-level JavaDocs
for
details.
These notifications are "best effort", meaning they can sometimes be skipped.
This method is very rarely necessary to implement. The "best effort" guarantee, together with the fact that this method should not result in discarding any data (per the "Checkpoint Subsuming Contract") means it is mainly useful for earlier cleanups of auxiliary resources. One example is to pro-actively clear a local per-checkpoint state cache upon checkpoint failure.
notifyCheckpointAborted
in interface CheckpointListener
checkpointId
- The ID of the checkpoint that has been aborted.Exception
- This method can propagate exceptions, which leads to a failure/recovery for
the task or job.public CompletableFuture<Void> isAvailable()
SourceReader
Once the future completes, the runtime will keep calling the SourceReader.pollNext(ReaderOutput)
method until that methods returns a status other than InputStatus.MORE_AVAILABLE
. After that the, the runtime will again call this method to
obtain the next future. Once that completes, it will again call SourceReader.pollNext(ReaderOutput)
and so on.
The contract is the following: If the reader has data available, then all futures previously returned by this method must eventually complete. Otherwise the source might stall indefinitely.
It is not a problem to have occasional "false positives", meaning to complete a future
even if no data is available. However, one should not use an "always complete" future in
cases no data is available, because that will result in busy waiting loops calling pollNext(...)
even though no data is available.
isAvailable
in interface SourceReader<T,HybridSourceSplit>
public void addSplits(List<HybridSourceSplit> splits)
SourceReader
SplitEnumeratorContext.assignSplit(SourceSplit, int)
or SplitEnumeratorContext.assignSplits(SplitsAssignment)
.addSplits
in interface SourceReader<T,HybridSourceSplit>
splits
- The splits assigned by the split enumerator.public void notifyNoMoreSplits()
SourceReader
It is triggered when the enumerator calls SplitEnumeratorContext.signalNoMoreSplits(int)
with the reader's parallel subtask.
notifyNoMoreSplits
in interface SourceReader<T,HybridSourceSplit>
public void handleSourceEvents(SourceEvent sourceEvent)
SourceReader
SplitEnumerator
. This method is called when
the enumerator sends an event via SplitEnumeratorContext.sendEventToSourceReader(int,
SourceEvent)
.
This method has a default implementation that does nothing, because most sources do not require any custom events.
handleSourceEvents
in interface SourceReader<T,HybridSourceSplit>
sourceEvent
- the event sent by the SplitEnumerator
.public void close() throws Exception
close
in interface AutoCloseable
Exception
Copyright © 2014–2023 The Apache Software Foundation. All rights reserved.