Class SimpleStreamFormat<T>

    • Constructor Detail

      • SimpleStreamFormat

        public SimpleStreamFormat()
    • Method Detail

      • isSplittable

        public final boolean isSplittable()
        This format is always not splittable.
        Specified by:
        isSplittable in interface StreamFormat<T>
      • createReader

        public final StreamFormat.Reader<T> createReader​(Configuration config,
                                                         FSDataInputStream stream,
                                                         long fileLen,
                                                         long splitEnd)
                                                  throws IOException
        Description copied from interface: StreamFormat
        Creates a new reader to read in this format. This method is called when a fresh reader is created for a split that was assigned from the enumerator. This method may also be called on recovery from a checkpoint, if the reader never stored an offset in the checkpoint (see StreamFormat.restoreReader(Configuration, FSDataInputStream, long, long, long) for details.

        If the format is splittable, then the stream is positioned to the beginning of the file split, otherwise it will be at position zero.

        The fileLen is the length of the entire file, while splitEnd is the offset of the first byte after the split end boundary (exclusive end boundary). For non-splittable formats, both values are identical.

        Specified by:
        createReader in interface StreamFormat<T>
        Throws:
        IOException
      • restoreReader

        public final StreamFormat.Reader<T> restoreReader​(Configuration config,
                                                          FSDataInputStream stream,
                                                          long restoredOffset,
                                                          long fileLen,
                                                          long splitEnd)
                                                   throws IOException
        Description copied from interface: StreamFormat
        Restores a reader from a checkpointed position. This method is called when the reader is recovered from a checkpoint and the reader has previously stored an offset into the checkpoint, by returning from the StreamFormat.Reader.getCheckpointedPosition() a value with non-negative offset. That value is supplied as the restoredOffset.

        If the format is splittable, then the stream is positioned to the beginning of the file split, otherwise it will be at position zero. The stream is NOT positioned to the checkpointed offset, because the format is free to interpret this offset in a different way than the byte offset in the file (for example as a record index).

        If the reader never produced a CheckpointedPosition with a non-negative offset before, then this method is not called, and the reader is created in the same way as a fresh reader via the method StreamFormat.createReader(Configuration, FSDataInputStream, long, long) and the appropriate number of records are read and discarded, to position to reader to the checkpointed position.

        Having a different method for restoring readers to a checkpointed position allows readers to seek to the start position differently in that case, compared to when the reader is created from a split offset generated at the enumerator. In the latter case, the offsets are commonly "approximate", because the enumerator typically generates splits based only on metadata. Readers then have to skip some bytes while searching for the next position to start from (based on a delimiter, sync marker, block offset, etc.). In contrast, checkpointed offsets are often precise, because they were recorded as the reader when through the data stream. Starting a reader from a checkpointed offset may hence not require and search for the next delimiter/block/marker.

        The fileLen is the length of the entire file, while splitEnd is the offset of the first byte after the split end boundary (exclusive end boundary). For non-splittable formats, both values are identical.

        Specified by:
        restoreReader in interface StreamFormat<T>
        Throws:
        IOException