@PublicEvolving public interface BulkFormat<T,SplitT extends FileSourceSplit> extends Serializable, ResultTypeQueryable<T>
BulkFormatreads and decodes batches of records at a time. Examples of bulk formats are formats like ORC or Parquet.
'BulkFormat' class acts mainly as a configuration holder and factory for the
reader. The actual reading is done by the
BulkFormat.Reader, which is created in the
createReader(Configuration, FileSourceSplit) method. If a bulk reader is
created based on a checkpoint during checkpointed streaming execution, then the reader is
re-created in the
restoreReader(Configuration, FileSourceSplit) method.
File splitting means dividing a file into multiple regions that can be read independently.
Whether a format supports splitting is indicated via the
Splitting has the potential to increase parallelism and performance, but poses additional constraints on the format readers: Readers need to be able to find a consistent starting point within the file near the offset where the split starts, (like the next record delimiter, or a block start or a sync marker). This is not necessarily possible for all formats, which is why splitting is optional.
The bulk reader returns an iterator per batch that it reads. The iterator produces records together with a position. That position is stored in the checkpointed state atomically with the processing of the record. That means it must be the position from where the reading can be resumed AFTER the record was processed; the position hence points effectively to the record AFTER the current record.
The simplest way to return this position information is to store no offset and simply store an incrementing count of records to skip after recovery. Given the above contract, the fist record would be returned with a records-to-skip count of one, the second one with a record count of two, etc.
Formats that have the ability to efficiently seek to a record (or to every n-th record) can use the position field to seek to a record directly and avoid having to read and discard many records on recovery.
Note on this design: Why do we not make the position point to the current record and always skip one record after recovery (the just processed record)? We need to be able to support formats where skipping records (even one) is not an option. For example formats that execute (pushed down) filters may want to avoid a skip-record-count all together, so that they don't skip the wrong records when the filter gets updated around a checkpoint/savepoint.
Like many other API classes in Flink, the outer class is serializable to support sending instances to distributed workers for parallel execution. This is purely short-term serialization for RPC and no instance of this will be long-term persisted in a serialized form.
Internally in the file source, the readers pass batches of records from the reading threads (that perform the typically blocking I/O operations) to the async mailbox threads that do the streaming and batch data processing. Passing records in batches (rather than one-at-a-time) much reduce the thread-to-thread handover overhead.
BulkFormat, one batch (as returned by
is handed over as one.
|Modifier and Type||Interface and Description|
The actual reader that reads the batches of records.
An iterator over records with their position in the file.
|Modifier and Type||Method and Description|
Creates a new reader that reads from the
Gets the type produced by this format.
Checks whether this format is splittable.
Creates a new reader that reads from
BulkFormat.Reader<T> createReader(Configuration config, SplitT split) throws IOException
split's pathstarting at the
FileSourceSplit.offset()split's offset} and reads
lengthbytes after the offset.
BulkFormat.Reader<T> restoreReader(Configuration config, SplitT split) throws IOException
offsetand reads until
lengthbytes after the offset. A number of
recordsToSkiprecords should be read and discarded after the offset. This is typically part of restoring a reader to a checkpointed position.
top-level JavaDocs (section "Splitting") for details.
Copyright © 2014–2023 The Apache Software Foundation. All rights reserved.