Class AvroInputFormat<E>

    • Constructor Detail

      • AvroInputFormat

        public AvroInputFormat​(Path filePath,
                               Class<E> type)
    • Method Detail

      • setReuseAvroValue

        public void setReuseAvroValue​(boolean reuseAvroValue)
        Sets the flag whether to reuse the Avro value instance for all records. By default, the input format reuses the Avro value.
        Parameters:
        reuseAvroValue - True, if the input format should reuse the Avro value instance, false otherwise.
      • setUnsplittable

        public void setUnsplittable​(boolean unsplittable)
        If set, the InputFormat will only read entire files.
      • open

        public void open​(FileInputSplit split)
                  throws IOException
        Description copied from class: FileInputFormat
        Opens an input stream to the file defined in the input format. The stream is positioned at the beginning of the given split.

        The stream is actually opened in an asynchronous thread to make sure any interruptions to the thread working on the input format do not reach the file system.

        Specified by:
        open in interface InputFormat<E,​FileInputSplit>
        Overrides:
        open in class FileInputFormat<E>
        Parameters:
        split - The split to be opened.
        Throws:
        IOException - Thrown, if the spit could not be opened due to an I/O problem.
      • reachedEnd

        public boolean reachedEnd()
                           throws IOException
        Description copied from interface: InputFormat
        Method used to check if the end of the input is reached.

        When this method is called, the input format it guaranteed to be opened.

        Specified by:
        reachedEnd in interface InputFormat<E,​FileInputSplit>
        Returns:
        True if the end is reached, otherwise false.
        Throws:
        IOException - Thrown, if an I/O error occurred.
      • getRecordsReadFromBlock

        public long getRecordsReadFromBlock()
      • nextRecord

        public E nextRecord​(E reuseValue)
                     throws IOException
        Description copied from interface: InputFormat
        Reads the next record from the input.

        When this method is called, the input format it guaranteed to be opened.

        Specified by:
        nextRecord in interface InputFormat<E,​FileInputSplit>
        Parameters:
        reuseValue - Object that may be reused.
        Returns:
        Read record.
        Throws:
        IOException - Thrown, if an I/O error occurred.
      • reopen

        public void reopen​(FileInputSplit split,
                           Tuple2<Long,​Long> state)
                    throws IOException
        Description copied from interface: CheckpointableInputFormat
        Restores the state of a parallel instance reading from an InputFormat. This is necessary when recovering from a task failure. When this method is called, the input format it guaranteed to be configured.

        NOTE: The caller has to make sure that the provided split is the one to whom the state belongs.

        Specified by:
        reopen in interface CheckpointableInputFormat<FileInputSplit,​Tuple2<Long,​Long>>
        Parameters:
        split - The split to be opened.
        state - The state from which to start from. This can contain the offset, but also other data, depending on the input format.
        Throws:
        IOException