Interface InputFormat<OT,​T extends InputSplit>

    • Method Detail

      • configure

        void configure​(Configuration parameters)
        Configures this input format. Since input formats are instantiated generically and hence parameterless, this method is the place where the input formats set their basic fields based on configuration values.

        This method is always called first on a newly instantiated input format.

        Parameters:
        parameters - The configuration with all parameters (note: not the Flink config but the TaskConfig).
      • getStatistics

        BaseStatistics getStatistics​(BaseStatistics cachedStatistics)
                              throws IOException
        Gets the basic statistics from the input described by this format. If the input format does not know how to create those statistics, it may return null. This method optionally gets a cached version of the statistics. The input format may examine them and decide whether it directly returns them without spending effort to re-gather the statistics.

        When this method is called, the input format is guaranteed to be configured.

        Parameters:
        cachedStatistics - The statistics that were cached. May be null.
        Returns:
        The base statistics for the input, or null, if not available.
        Throws:
        IOException
      • createInputSplits

        T[] createInputSplits​(int minNumSplits)
                       throws IOException
        Description copied from interface: InputSplitSource
        Computes the input splits. The given minimum number of splits is a hint as to how many splits are desired.
        Specified by:
        createInputSplits in interface InputSplitSource<OT>
        Parameters:
        minNumSplits - Number of minimal input splits, as a hint.
        Returns:
        An array of input splits.
        Throws:
        IOException
      • getInputSplitAssigner

        InputSplitAssigner getInputSplitAssigner​(T[] inputSplits)
        Description copied from interface: InputSplitSource
        Returns the assigner for the input splits. Assigner determines which parallel instance of the input format gets which input split.
        Specified by:
        getInputSplitAssigner in interface InputSplitSource<OT>
        Returns:
        The input split assigner.
      • open

        void open​(T split)
           throws IOException
        Opens a parallel instance of the input format to work on a split.

        When this method is called, the input format it guaranteed to be configured.

        Parameters:
        split - The split to be opened.
        Throws:
        IOException - Thrown, if the spit could not be opened due to an I/O problem.
      • reachedEnd

        boolean reachedEnd()
                    throws IOException
        Method used to check if the end of the input is reached.

        When this method is called, the input format it guaranteed to be opened.

        Returns:
        True if the end is reached, otherwise false.
        Throws:
        IOException - Thrown, if an I/O error occurred.
      • nextRecord

        OT nextRecord​(OT reuse)
               throws IOException
        Reads the next record from the input.

        When this method is called, the input format it guaranteed to be opened.

        Parameters:
        reuse - Object that may be reused.
        Returns:
        Read record.
        Throws:
        IOException - Thrown, if an I/O error occurred.
      • close

        void close()
            throws IOException
        Method that marks the end of the life-cycle of an input split. Should be used to close channels and streams and release resources. After this method returns without an error, the input is assumed to be correctly read.

        When this method is called, the input format it guaranteed to be opened.

        Throws:
        IOException - Thrown, if the input could not be closed properly.