@PublicEvolving public class NonSplittingRecursiveEnumerator extends Object implements FileEnumerator
FileEnumerator
enumerates all files under the given paths recursively. Each file
becomes one split; this enumerator does not split files into smaller "block" units.
The default instantiation of this enumerator filters files with the common hidden file prefixes '.' and '_'. A custom file filter can be specified.
FileEnumerator.Provider
Modifier and Type | Field and Description |
---|---|
protected Predicate<Path> |
fileFilter
The filter predicate to filter out unwanted files.
|
Constructor and Description |
---|
NonSplittingRecursiveEnumerator()
Creates a NonSplittingRecursiveEnumerator that enumerates all files except hidden files.
|
NonSplittingRecursiveEnumerator(Predicate<Path> fileFilter)
Creates a NonSplittingRecursiveEnumerator that uses the given predicate as a filter for file
paths.
|
Modifier and Type | Method and Description |
---|---|
protected void |
addSplitsForPath(FileStatus fileStatus,
FileSystem fs,
ArrayList<FileSourceSplit> target) |
protected void |
convertToSourceSplits(FileStatus file,
FileSystem fs,
List<FileSourceSplit> target) |
Collection<FileSourceSplit> |
enumerateSplits(Path[] paths,
int minDesiredSplits)
Generates all file splits for the relevant files under the given paths.
|
protected String |
getNextId() |
public NonSplittingRecursiveEnumerator()
public Collection<FileSourceSplit> enumerateSplits(Path[] paths, int minDesiredSplits) throws IOException
FileEnumerator
minDesiredSplits
is an optional hint indicating how many splits would be necessary to
exploit parallelism properly.enumerateSplits
in interface FileEnumerator
IOException
protected void addSplitsForPath(FileStatus fileStatus, FileSystem fs, ArrayList<FileSourceSplit> target) throws IOException
IOException
protected void convertToSourceSplits(FileStatus file, FileSystem fs, List<FileSourceSplit> target) throws IOException
IOException
protected final String getNextId()
Copyright © 2014–2024 The Apache Software Foundation. All rights reserved.