Class FileInputFormat<OT>

    • Field Detail

      • INFLATER_INPUT_STREAM_FACTORIES

        protected static final Map<String,​InflaterInputStreamFactory<?>> INFLATER_INPUT_STREAM_FACTORIES
        A mapping of file extensions to decompression algorithms based on DEFLATE. Such compressions lead to unsplittable files.
      • READ_WHOLE_SPLIT_FLAG

        protected static final long READ_WHOLE_SPLIT_FLAG
        The splitLength is set to -1L for reading the whole split.
        See Also:
        Constant Field Values
      • stream

        protected transient FSDataInputStream stream
        The input stream reading from the input file.
      • splitStart

        protected transient long splitStart
        The start of the split that this parallel instance must consume.
      • splitLength

        protected transient long splitLength
        The length of the split that this parallel instance must consume.
      • currentSplit

        protected transient FileInputSplit currentSplit
        The current split that this parallel instance must consume.
      • minSplitSize

        protected long minSplitSize
        The minimal split size, set by the configure() method.
      • numSplits

        protected int numSplits
        The desired number of splits, as set by the configure() method.
      • openTimeout

        protected long openTimeout
        Stream opening timeout.
      • unsplittable

        protected boolean unsplittable
        Some file input formats are not splittable on a block level (deflate) Therefore, the FileInputFormat can only read whole files.
      • enumerateNestedFiles

        protected boolean enumerateNestedFiles
        The flag to specify whether recursive traversal of the input directory structure is enabled.
    • Constructor Detail

      • FileInputFormat

        public FileInputFormat()
      • FileInputFormat

        protected FileInputFormat​(Path filePath)
    • Method Detail

      • registerInflaterInputStreamFactory

        public static void registerInflaterInputStreamFactory​(String fileExtension,
                                                              InflaterInputStreamFactory<?> factory)
        Registers a decompression algorithm through a InflaterInputStreamFactory with a file extension for transparent decompression.
        Parameters:
        fileExtension - of the compressed files
        factory - to create an InflaterInputStream that handles the decompression format
      • extractFileExtension

        protected static String extractFileExtension​(String fileName)
        Returns the extension of a file name (!= a path).
        Returns:
        the extension of the file name or null if there is no extension.
      • getFilePaths

        public Path[] getFilePaths()
        Returns the paths of all files to be read by the FileInputFormat.
        Returns:
        The list of all paths to read.
      • setFilePath

        public void setFilePath​(String filePath)
      • setFilePath

        public void setFilePath​(Path filePath)
        Sets a single path of a file to be read.
        Parameters:
        filePath - The path of the file to read.
      • setFilePaths

        public void setFilePaths​(String... filePaths)
        Sets multiple paths of files to be read.
        Parameters:
        filePaths - The paths of the files to read.
      • setFilePaths

        public void setFilePaths​(Path... filePaths)
        Sets multiple paths of files to be read.
        Parameters:
        filePaths - The paths of the files to read.
      • getMinSplitSize

        public long getMinSplitSize()
      • setMinSplitSize

        public void setMinSplitSize​(long minSplitSize)
      • getNumSplits

        public int getNumSplits()
      • setNumSplits

        public void setNumSplits​(int numSplits)
      • getOpenTimeout

        public long getOpenTimeout()
      • setOpenTimeout

        public void setOpenTimeout​(long openTimeout)
      • setNestedFileEnumeration

        public void setNestedFileEnumeration​(boolean enable)
      • getNestedFileEnumeration

        public boolean getNestedFileEnumeration()
      • getSplitStart

        public long getSplitStart()
        Gets the start of the current split.
        Returns:
        The start of the split.
      • getSplitLength

        public long getSplitLength()
        Gets the length or remaining length of the current split.
        Returns:
        The length or remaining length of the current split.
      • setFilesFilter

        public void setFilesFilter​(FilePathFilter filesFilter)
      • getInputSplitAssigner

        public LocatableInputSplitAssigner getInputSplitAssigner​(FileInputSplit[] splits)
        Description copied from interface: InputSplitSource
        Returns the assigner for the input splits. Assigner determines which parallel instance of the input format gets which input split.
        Returns:
        The input split assigner.
      • createInputSplits

        public FileInputSplit[] createInputSplits​(int minNumSplits)
                                           throws IOException
        Computes the input splits for the file. By default, one file block is one split. If more splits are requested than blocks are available, then a split may be a fraction of a block and splits may cross block boundaries.
        Parameters:
        minNumSplits - The minimum desired number of file splits.
        Returns:
        The computed file splits.
        Throws:
        IOException
        See Also:
        InputFormat.createInputSplits(int)
      • testForUnsplittable

        protected boolean testForUnsplittable​(FileStatus pathFile)
      • acceptFile

        public boolean acceptFile​(FileStatus fileStatus)
        A simple hook to filter files and directories from the input. The method may be overridden. Hadoop's FileInputFormat has a similar mechanism and applies the same filters by default.
        Parameters:
        fileStatus - The file status to check.
        Returns:
        true, if the given file or directory is accepted
      • open

        public void open​(FileInputSplit fileSplit)
                  throws IOException
        Opens an input stream to the file defined in the input format. The stream is positioned at the beginning of the given split.

        The stream is actually opened in an asynchronous thread to make sure any interruptions to the thread working on the input format do not reach the file system.

        Parameters:
        fileSplit - The split to be opened.
        Throws:
        IOException - Thrown, if the spit could not be opened due to an I/O problem.
      • close

        public void close()
                   throws IOException
        Closes the file input stream of the input format.
        Throws:
        IOException - Thrown, if the input could not be closed properly.