Class SimpleStreamFormat<T>
- java.lang.Object
-
- org.apache.flink.connector.file.src.reader.SimpleStreamFormat<T>
-
- Type Parameters:
T
- The type of records created by this format reader.
- All Implemented Interfaces:
Serializable
,ResultTypeQueryable<T>
,StreamFormat<T>
- Direct Known Subclasses:
CsvReaderFormat
,TextLineInputFormat
@PublicEvolving public abstract class SimpleStreamFormat<T> extends Object implements StreamFormat<T>
A simple version of theStreamFormat
, for formats that are not splittable.This format makes no difference between creating readers from scratch (new file) or from a checkpoint. Because of that, if the reader actively checkpoints its position (via the
StreamFormat.Reader.getCheckpointedPosition()
method) then the checkpointed offset must be a byte offset in the file from which the stream can be resumed as if it were the beginning of the file.For all other details, please check the docs of
StreamFormat
.- See Also:
- Serialized Form
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from interface org.apache.flink.connector.file.src.reader.StreamFormat
StreamFormat.Reader<T>
-
-
Field Summary
-
Fields inherited from interface org.apache.flink.connector.file.src.reader.StreamFormat
FETCH_IO_SIZE
-
-
Constructor Summary
Constructors Constructor Description SimpleStreamFormat()
-
Method Summary
All Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description abstract StreamFormat.Reader<T>
createReader(Configuration config, FSDataInputStream stream)
Creates a new reader.StreamFormat.Reader<T>
createReader(Configuration config, FSDataInputStream stream, long fileLen, long splitEnd)
Creates a new reader to read in this format.abstract TypeInformation<T>
getProducedType()
Gets the type produced by this format.boolean
isSplittable()
This format is always not splittable.StreamFormat.Reader<T>
restoreReader(Configuration config, FSDataInputStream stream, long restoredOffset, long fileLen, long splitEnd)
Restores a reader from a checkpointed position.
-
-
-
Method Detail
-
createReader
public abstract StreamFormat.Reader<T> createReader(Configuration config, FSDataInputStream stream) throws IOException
Creates a new reader. This method is called both for the creation of new reader (from the beginning of a file) and for restoring checkpointed readers.If the reader previously checkpointed an offset, then the input stream will be positioned to that particular offset. Readers checkpoint an offset by returning a value from the method
StreamFormat.Reader.getCheckpointedPosition()
method with an offset other thanCheckpointedPosition.NO_OFFSET
).- Throws:
IOException
-
getProducedType
public abstract TypeInformation<T> getProducedType()
Gets the type produced by this format. This type will be the type produced by the file source as a whole.- Specified by:
getProducedType
in interfaceResultTypeQueryable<T>
- Specified by:
getProducedType
in interfaceStreamFormat<T>
- Returns:
- The data type produced by this function or input format.
-
isSplittable
public final boolean isSplittable()
This format is always not splittable.- Specified by:
isSplittable
in interfaceStreamFormat<T>
-
createReader
public final StreamFormat.Reader<T> createReader(Configuration config, FSDataInputStream stream, long fileLen, long splitEnd) throws IOException
Description copied from interface:StreamFormat
Creates a new reader to read in this format. This method is called when a fresh reader is created for a split that was assigned from the enumerator. This method may also be called on recovery from a checkpoint, if the reader never stored an offset in the checkpoint (seeStreamFormat.restoreReader(Configuration, FSDataInputStream, long, long, long)
for details.If the format is
splittable
, then thestream
is positioned to the beginning of the file split, otherwise it will be at position zero.The
fileLen
is the length of the entire file, whilesplitEnd
is the offset of the first byte after the split end boundary (exclusive end boundary). For non-splittable formats, both values are identical.- Specified by:
createReader
in interfaceStreamFormat<T>
- Throws:
IOException
-
restoreReader
public final StreamFormat.Reader<T> restoreReader(Configuration config, FSDataInputStream stream, long restoredOffset, long fileLen, long splitEnd) throws IOException
Description copied from interface:StreamFormat
Restores a reader from a checkpointed position. This method is called when the reader is recovered from a checkpoint and the reader has previously stored an offset into the checkpoint, by returning from theStreamFormat.Reader.getCheckpointedPosition()
a value with non-negativeoffset
. That value is supplied as therestoredOffset
.If the format is
splittable
, then thestream
is positioned to the beginning of the file split, otherwise it will be at position zero. The stream is NOT positioned to the checkpointed offset, because the format is free to interpret this offset in a different way than the byte offset in the file (for example as a record index).If the reader never produced a
CheckpointedPosition
with a non-negative offset before, then this method is not called, and the reader is created in the same way as a fresh reader via the methodStreamFormat.createReader(Configuration, FSDataInputStream, long, long)
and the appropriate number of records are read and discarded, to position to reader to the checkpointed position.Having a different method for restoring readers to a checkpointed position allows readers to seek to the start position differently in that case, compared to when the reader is created from a split offset generated at the enumerator. In the latter case, the offsets are commonly "approximate", because the enumerator typically generates splits based only on metadata. Readers then have to skip some bytes while searching for the next position to start from (based on a delimiter, sync marker, block offset, etc.). In contrast, checkpointed offsets are often precise, because they were recorded as the reader when through the data stream. Starting a reader from a checkpointed offset may hence not require and search for the next delimiter/block/marker.
The
fileLen
is the length of the entire file, whilesplitEnd
is the offset of the first byte after the split end boundary (exclusive end boundary). For non-splittable formats, both values are identical.- Specified by:
restoreReader
in interfaceStreamFormat<T>
- Throws:
IOException
-
-