@Internal public class DynamicFileSplitEnumerator<SplitT extends FileSourceSplit> extends Object implements SplitEnumerator<SplitT,PendingSplitsCheckpoint<SplitT>>, SupportsHandleExecutionAttemptSourceEvent
This enumerator handles DynamicFilteringEvent
and filter out the desired input splits
with the support of the DynamicFileEnumerator
.
If the enumerator receives the first split request before any dynamic filtering data is received, it will enumerate all splits. If a DynamicFilterEvent is received during the fully enumerating, the remaining splits will be filtered accordingly.
Constructor and Description |
---|
DynamicFileSplitEnumerator(SplitEnumeratorContext<SplitT> context,
DynamicFileEnumerator.Provider fileEnumeratorFactory,
FileSplitAssigner.Provider splitAssignerFactory) |
Modifier and Type | Method and Description |
---|---|
void |
addReader(int subtaskId)
Add a new source reader with the given subtask ID.
|
void |
addSplitsBack(List<SplitT> splits,
int subtaskId)
Add splits back to the split enumerator.
|
void |
close()
Called to close the enumerator, in case it holds on to any resources, like threads or network
connections.
|
void |
handleSourceEvent(int subtaskId,
int attemptNumber,
SourceEvent sourceEvent)
Handles a custom source event from the source reader.
|
void |
handleSourceEvent(int subtaskId,
SourceEvent sourceEvent)
Handles a custom source event from the source reader.
|
void |
handleSplitRequest(int subtask,
String hostname)
Handles the request for a split.
|
PendingSplitsCheckpoint<SplitT> |
snapshotState(long checkpointId)
Creates a snapshot of the state of this split enumerator, to be stored in a checkpoint.
|
void |
start()
Start the split enumerator.
|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
notifyCheckpointComplete
notifyCheckpointAborted
public DynamicFileSplitEnumerator(SplitEnumeratorContext<SplitT> context, DynamicFileEnumerator.Provider fileEnumeratorFactory, FileSplitAssigner.Provider splitAssignerFactory)
public void start()
SplitEnumerator
The default behavior does nothing.
start
in interface SplitEnumerator<SplitT extends FileSourceSplit,PendingSplitsCheckpoint<SplitT extends FileSourceSplit>>
public void close() throws IOException
SplitEnumerator
close
in interface AutoCloseable
close
in interface SplitEnumerator<SplitT extends FileSourceSplit,PendingSplitsCheckpoint<SplitT extends FileSourceSplit>>
IOException
public void addReader(int subtaskId)
SplitEnumerator
addReader
in interface SplitEnumerator<SplitT extends FileSourceSplit,PendingSplitsCheckpoint<SplitT extends FileSourceSplit>>
subtaskId
- the subtask ID of the new source reader.public void handleSplitRequest(int subtask, @Nullable String hostname)
SplitEnumerator
SourceReaderContext.sendSplitRequest()
method.handleSplitRequest
in interface SplitEnumerator<SplitT extends FileSourceSplit,PendingSplitsCheckpoint<SplitT extends FileSourceSplit>>
subtask
- the subtask id of the source reader who sent the source event.hostname
- Optional, the hostname where the requesting task is running. This
can be used to make split assignments locality-aware.public void handleSourceEvent(int subtaskId, SourceEvent sourceEvent)
SplitEnumerator
This method has a default implementation that does nothing, because it is only required to
be implemented by some sources, which have a custom event protocol between reader and
enumerator. The common events for reader registration and split requests are not dispatched
to this method, but rather invoke the SplitEnumerator.addReader(int)
and SplitEnumerator.handleSplitRequest(int, String)
methods.
handleSourceEvent
in interface SplitEnumerator<SplitT extends FileSourceSplit,PendingSplitsCheckpoint<SplitT extends FileSourceSplit>>
subtaskId
- the subtask id of the source reader who sent the source event.sourceEvent
- the source event from the source reader.public void addSplitsBack(List<SplitT> splits, int subtaskId)
SplitEnumerator
SourceReader
fails and there are splits assigned to it after the last successful checkpoint.addSplitsBack
in interface SplitEnumerator<SplitT extends FileSourceSplit,PendingSplitsCheckpoint<SplitT extends FileSourceSplit>>
splits
- The splits to add back to the enumerator for reassignment.subtaskId
- The id of the subtask to which the returned splits belong.public PendingSplitsCheckpoint<SplitT> snapshotState(long checkpointId)
SplitEnumerator
The snapshot should contain the latest state of the enumerator: It should assume that all
operations that happened before the snapshot have successfully completed. For example all
splits assigned to readers via SplitEnumeratorContext.assignSplit(SourceSplit, int)
and SplitEnumeratorContext.assignSplits(SplitsAssignment)
) don't need to be included
in the snapshot anymore.
This method takes the ID of the checkpoint for which the state is snapshotted. Most implementations should be able to ignore this parameter, because for the contents of the snapshot, it doesn't matter for which checkpoint it gets created. This parameter can be interesting for source connectors with external systems where those systems are themselves aware of checkpoints; for example in cases where the enumerator notifies that system about a specific checkpoint being triggered.
snapshotState
in interface SplitEnumerator<SplitT extends FileSourceSplit,PendingSplitsCheckpoint<SplitT extends FileSourceSplit>>
checkpointId
- The ID of the checkpoint for which the snapshot is created.public void handleSourceEvent(int subtaskId, int attemptNumber, SourceEvent sourceEvent)
SupportsHandleExecutionAttemptSourceEvent
SplitEnumerator.handleSourceEvent(int, SourceEvent)
but is aware of the subtask execution
attempt who sent this event.handleSourceEvent
in interface SupportsHandleExecutionAttemptSourceEvent
subtaskId
- the subtask id of the source reader who sent the source event.attemptNumber
- the attempt number of the source reader who sent the source event.sourceEvent
- the source event from the source reader.Copyright © 2014–2024 The Apache Software Foundation. All rights reserved.