protected abstract static class AbstractFileSource.AbstractFileSourceBuilder<T,SplitT extends FileSourceSplit,SELF extends AbstractFileSource.AbstractFileSourceBuilder<T,SplitT,SELF>> extends Object
public class SubBuilder<T> extends AbstractFileSourceBuilder<T, SubBuilder<T>> {
...
}
That way, all return values from builder method defined here are typed to the sub-class type and support fluent chaining.
We don't make the publicly visible builder generic with a SELF type, because it leads to generic signatures that can look complicated and confusing.
Modifier and Type | Field and Description |
---|---|
protected ContinuousEnumerationSettings |
continuousSourceSettings |
protected FileEnumerator.Provider |
fileEnumerator |
protected Path[] |
inputPaths |
protected BulkFormat<T,SplitT> |
readerFormat |
protected FileSplitAssigner.Provider |
splitAssigner |
Modifier | Constructor and Description |
---|---|
protected |
AbstractFileSourceBuilder(Path[] inputPaths,
BulkFormat<T,SplitT> readerFormat,
FileEnumerator.Provider defaultFileEnumerator,
FileSplitAssigner.Provider defaultSplitAssigner) |
Modifier and Type | Method and Description |
---|---|
abstract AbstractFileSource<T,SplitT> |
build()
Creates the file source with the settings applied to this builder.
|
SELF |
monitorContinuously(Duration discoveryInterval)
Sets this source to streaming ("continuous monitoring") mode.
|
SELF |
processStaticFileSet()
Sets this source to bounded (batch) mode.
|
SELF |
setFileEnumerator(FileEnumerator.Provider fileEnumerator)
Configures the
FileEnumerator for the source. |
SELF |
setSplitAssigner(FileSplitAssigner.Provider splitAssigner)
Configures the
FileSplitAssigner for the source. |
protected final Path[] inputPaths
protected final BulkFormat<T,SplitT extends FileSourceSplit> readerFormat
protected FileEnumerator.Provider fileEnumerator
protected FileSplitAssigner.Provider splitAssigner
@Nullable protected ContinuousEnumerationSettings continuousSourceSettings
protected AbstractFileSourceBuilder(Path[] inputPaths, BulkFormat<T,SplitT> readerFormat, FileEnumerator.Provider defaultFileEnumerator, FileSplitAssigner.Provider defaultSplitAssigner)
public abstract AbstractFileSource<T,SplitT> build()
public SELF monitorContinuously(Duration discoveryInterval)
This makes the source a "continuous streaming" source that keeps running, monitoring for new files, and reads these files when they appear and are discovered by the monitoring.
The interval in which the source checks for new files is the discoveryInterval
. Shorter intervals mean that files are discovered more quickly, but
also imply more frequent listing or directory traversal of the file system / object
store.
public SELF processStaticFileSet()
In this mode, the source processes the files that are under the given paths when the application is started. Once all files are processed, the source will finish.
This setting is also the default behavior. This method is mainly here to "switch back" to bounded (batch) mode, or to make it explicit in the source construction.
public SELF setFileEnumerator(FileEnumerator.Provider fileEnumerator)
FileEnumerator
for the source. The File Enumerator is responsible
for selecting from the input path the set of files that should be processed (and which to
filter out). Furthermore, the File Enumerator may split the files further into
sub-regions, to enable parallelization beyond the number of files.public SELF setSplitAssigner(FileSplitAssigner.Provider splitAssigner)
FileSplitAssigner
for the source. The File Split Assigner
determines which parallel reader instance gets which FileSourceSplit
, and in
which order these splits are assigned.Copyright © 2014–2024 The Apache Software Foundation. All rights reserved.