Class StaticFileSplitEnumerator
- java.lang.Object
-
- org.apache.flink.connector.file.src.impl.StaticFileSplitEnumerator
-
- All Implemented Interfaces:
AutoCloseable
,CheckpointListener
,SplitEnumerator<FileSourceSplit,PendingSplitsCheckpoint<FileSourceSplit>>
,SupportsBatchSnapshot
@Internal public class StaticFileSplitEnumerator extends Object implements SplitEnumerator<FileSourceSplit,PendingSplitsCheckpoint<FileSourceSplit>>, SupportsBatchSnapshot
A SplitEnumerator implementation for bounded / batchFileSource
input.This enumerator takes all files that are present in the configured input directories and assigns them to the readers. Once all files are processed, the source is finished.
The implementation of this class is rather thin. The actual logic for creating the set of FileSourceSplits to process, and the logic to decide which reader gets what split, are in
FileEnumerator
and inFileSplitAssigner
, respectively.
-
-
Constructor Summary
Constructors Constructor Description StaticFileSplitEnumerator(SplitEnumeratorContext<FileSourceSplit> context, FileSplitAssigner splitAssigner)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
addReader(int subtaskId)
Add a new source reader with the given subtask ID.void
addSplitsBack(List<FileSourceSplit> splits, int subtaskId)
Add splits back to the split enumerator.void
close()
Called to close the enumerator, in case it holds on to any resources, like threads or network connections.void
handleSourceEvent(int subtaskId, SourceEvent sourceEvent)
Handles a custom source event from the source reader.void
handleSplitRequest(int subtask, String hostname)
Handles the request for a split.PendingSplitsCheckpoint<FileSourceSplit>
snapshotState(long checkpointId)
Creates a snapshot of the state of this split enumerator, to be stored in a checkpoint.void
start()
Start the split enumerator.-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface org.apache.flink.api.common.state.CheckpointListener
notifyCheckpointAborted
-
Methods inherited from interface org.apache.flink.api.connector.source.SplitEnumerator
notifyCheckpointComplete
-
-
-
-
Constructor Detail
-
StaticFileSplitEnumerator
public StaticFileSplitEnumerator(SplitEnumeratorContext<FileSourceSplit> context, FileSplitAssigner splitAssigner)
-
-
Method Detail
-
start
public void start()
Description copied from interface:SplitEnumerator
Start the split enumerator.The default behavior does nothing.
- Specified by:
start
in interfaceSplitEnumerator<FileSourceSplit,PendingSplitsCheckpoint<FileSourceSplit>>
-
close
public void close() throws IOException
Description copied from interface:SplitEnumerator
Called to close the enumerator, in case it holds on to any resources, like threads or network connections.- Specified by:
close
in interfaceAutoCloseable
- Specified by:
close
in interfaceSplitEnumerator<FileSourceSplit,PendingSplitsCheckpoint<FileSourceSplit>>
- Throws:
IOException
-
addReader
public void addReader(int subtaskId)
Description copied from interface:SplitEnumerator
Add a new source reader with the given subtask ID.- Specified by:
addReader
in interfaceSplitEnumerator<FileSourceSplit,PendingSplitsCheckpoint<FileSourceSplit>>
- Parameters:
subtaskId
- the subtask ID of the new source reader.
-
handleSplitRequest
public void handleSplitRequest(int subtask, @Nullable String hostname)
Description copied from interface:SplitEnumerator
Handles the request for a split. This method is called when the reader with the given subtask id calls theSourceReaderContext.sendSplitRequest()
method.- Specified by:
handleSplitRequest
in interfaceSplitEnumerator<FileSourceSplit,PendingSplitsCheckpoint<FileSourceSplit>>
- Parameters:
subtask
- the subtask id of the source reader who sent the source event.hostname
- Optional, the hostname where the requesting task is running. This can be used to make split assignments locality-aware.
-
handleSourceEvent
public void handleSourceEvent(int subtaskId, SourceEvent sourceEvent)
Description copied from interface:SplitEnumerator
Handles a custom source event from the source reader.This method has a default implementation that does nothing, because it is only required to be implemented by some sources, which have a custom event protocol between reader and enumerator. The common events for reader registration and split requests are not dispatched to this method, but rather invoke the
SplitEnumerator.addReader(int)
andSplitEnumerator.handleSplitRequest(int, String)
methods.- Specified by:
handleSourceEvent
in interfaceSplitEnumerator<FileSourceSplit,PendingSplitsCheckpoint<FileSourceSplit>>
- Parameters:
subtaskId
- the subtask id of the source reader who sent the source event.sourceEvent
- the source event from the source reader.
-
addSplitsBack
public void addSplitsBack(List<FileSourceSplit> splits, int subtaskId)
Description copied from interface:SplitEnumerator
Add splits back to the split enumerator. This will only happen when aSourceReader
fails and there are splits assigned to it after the last successful checkpoint.- Specified by:
addSplitsBack
in interfaceSplitEnumerator<FileSourceSplit,PendingSplitsCheckpoint<FileSourceSplit>>
- Parameters:
splits
- The splits to add back to the enumerator for reassignment.subtaskId
- The id of the subtask to which the returned splits belong.
-
snapshotState
public PendingSplitsCheckpoint<FileSourceSplit> snapshotState(long checkpointId)
Description copied from interface:SplitEnumerator
Creates a snapshot of the state of this split enumerator, to be stored in a checkpoint.The snapshot should contain the latest state of the enumerator: It should assume that all operations that happened before the snapshot have successfully completed. For example all splits assigned to readers via
SplitEnumeratorContext.assignSplit(SourceSplit, int)
andSplitEnumeratorContext.assignSplits(SplitsAssignment)
) don't need to be included in the snapshot anymore.This method takes the ID of the checkpoint for which the state is snapshotted. Most implementations should be able to ignore this parameter, because for the contents of the snapshot, it doesn't matter for which checkpoint it gets created. This parameter can be interesting for source connectors with external systems where those systems are themselves aware of checkpoints; for example in cases where the enumerator notifies that system about a specific checkpoint being triggered.
- Specified by:
snapshotState
in interfaceSplitEnumerator<FileSourceSplit,PendingSplitsCheckpoint<FileSourceSplit>>
- Parameters:
checkpointId
- The ID of the checkpoint for which the snapshot is created.- Returns:
- an object containing the state of the split enumerator.
-
-