Class DelimitedInputFormat<OT>

    • Field Detail

      • currBuffer

        protected transient byte[] currBuffer
      • currOffset

        protected transient int currOffset
      • currLen

        protected transient int currLen
      • RECORD_DELIMITER

        protected static final String RECORD_DELIMITER
        The configuration key to set the record delimiter.
        See Also:
        Constant Field Values
    • Constructor Detail

      • DelimitedInputFormat

        public DelimitedInputFormat()
      • DelimitedInputFormat

        protected DelimitedInputFormat​(Path filePath,
                                       Configuration configuration)
    • Method Detail

      • loadConfigParameters

        protected static void loadConfigParameters​(Configuration parameters)
      • getCharset

        @PublicEvolving
        public Charset getCharset()
        Get the character set used for the row delimiter. This is also used by subclasses to interpret field delimiters, comment strings, and for configuring FieldParsers.
        Returns:
        the charset
      • setCharset

        @PublicEvolving
        public void setCharset​(String charset)
        Set the name of the character set used for the row delimiter. This is also used by subclasses to interpret field delimiters, comment strings, and for configuring FieldParsers.

        These fields are interpreted when set. Changing the charset thereafter may cause unexpected results.

        Parameters:
        charset - name of the charset
      • getDelimiter

        public byte[] getDelimiter()
      • setDelimiter

        public void setDelimiter​(byte[] delimiter)
      • setDelimiter

        public void setDelimiter​(char delimiter)
      • setDelimiter

        public void setDelimiter​(String delimiter)
      • getLineLengthLimit

        public int getLineLengthLimit()
      • setLineLengthLimit

        public void setLineLengthLimit​(int lineLengthLimit)
      • getBufferSize

        public int getBufferSize()
      • setBufferSize

        public void setBufferSize​(int bufferSize)
      • getNumLineSamples

        public int getNumLineSamples()
      • setNumLineSamples

        public void setNumLineSamples​(int numLineSamples)
      • readRecord

        public abstract OT readRecord​(OT reuse,
                                      byte[] bytes,
                                      int offset,
                                      int numBytes)
                               throws IOException
        This function parses the given byte array which represents a serialized record. The function returns a valid record or throws an IOException.
        Parameters:
        reuse - An optionally reusable object.
        bytes - Binary data of serialized records.
        offset - The offset where to start to read the record data.
        numBytes - The number of bytes that can be read starting at the offset position.
        Returns:
        Returns the read record if it was successfully deserialized.
        Throws:
        IOException - if the record could not be read.
      • reachedEnd

        public boolean reachedEnd()
        Checks whether the current split is at its end.
        Specified by:
        reachedEnd in interface InputFormat<OT,​FileInputSplit>
        Returns:
        True, if the split is at its end, false otherwise.
      • nextRecord

        public OT nextRecord​(OT record)
                      throws IOException
        Description copied from interface: InputFormat
        Reads the next record from the input.

        When this method is called, the input format it guaranteed to be opened.

        Specified by:
        nextRecord in interface InputFormat<OT,​FileInputSplit>
        Parameters:
        record - Object that may be reused.
        Returns:
        Read record.
        Throws:
        IOException - Thrown, if an I/O error occurred.
      • reopen

        @PublicEvolving
        public void reopen​(FileInputSplit split,
                           Long state)
                    throws IOException
        Description copied from interface: CheckpointableInputFormat
        Restores the state of a parallel instance reading from an InputFormat. This is necessary when recovering from a task failure. When this method is called, the input format it guaranteed to be configured.

        NOTE: The caller has to make sure that the provided split is the one to whom the state belongs.

        Specified by:
        reopen in interface CheckpointableInputFormat<FileInputSplit,​Long>
        Parameters:
        split - The split to be opened.
        state - The state from which to start from. This can contain the offset, but also other data, depending on the input format.
        Throws:
        IOException
      • initializeSplit

        protected void initializeSplit​(FileInputSplit split,
                                       @Nullable
                                       Long state)
                                throws IOException
        Initialization method that is called after opening or reopening an input split.
        Parameters:
        split - Split that was opened or reopened
        state - Checkpointed state if the split was reopened
        Throws:
        IOException