Class AbstractFileSource.AbstractFileSourceBuilder<T,​SplitT extends FileSourceSplit,​SELF extends AbstractFileSource.AbstractFileSourceBuilder<T,​SplitT,​SELF>>

  • Direct Known Subclasses:
    FileSource.FileSourceBuilder
    Enclosing class:
    AbstractFileSource<T,​SplitT extends FileSourceSplit>

    protected abstract static class AbstractFileSource.AbstractFileSourceBuilder<T,​SplitT extends FileSourceSplit,​SELF extends AbstractFileSource.AbstractFileSourceBuilder<T,​SplitT,​SELF>>
    extends Object
    The generic base builder. This builder carries a SELF type to make it convenient to extend this for subclasses, using the following pattern.
    
     public class SubBuilder<T> extends AbstractFileSourceBuilder<T, SubBuilder<T>> {
         ...
     }
     

    That way, all return values from builder method defined here are typed to the sub-class type and support fluent chaining.

    We don't make the publicly visible builder generic with a SELF type, because it leads to generic signatures that can look complicated and confusing.

    • Method Detail

      • build

        public abstract AbstractFileSource<T,​SplitT> build()
        Creates the file source with the settings applied to this builder.
      • monitorContinuously

        public SELF monitorContinuously​(Duration discoveryInterval)
        Sets this source to streaming ("continuous monitoring") mode.

        This makes the source a "continuous streaming" source that keeps running, monitoring for new files, and reads these files when they appear and are discovered by the monitoring.

        The interval in which the source checks for new files is the discoveryInterval. Shorter intervals mean that files are discovered more quickly, but also imply more frequent listing or directory traversal of the file system / object store.

      • processStaticFileSet

        public SELF processStaticFileSet()
        Sets this source to bounded (batch) mode.

        In this mode, the source processes the files that are under the given paths when the application is started. Once all files are processed, the source will finish.

        This setting is also the default behavior. This method is mainly here to "switch back" to bounded (batch) mode, or to make it explicit in the source construction.

      • setFileEnumerator

        public SELF setFileEnumerator​(FileEnumerator.Provider fileEnumerator)
        Configures the FileEnumerator for the source. The File Enumerator is responsible for selecting from the input path the set of files that should be processed (and which to filter out). Furthermore, the File Enumerator may split the files further into sub-regions, to enable parallelization beyond the number of files.