pyflink.datastream.connectors.file_system.BulkFormat#
- class BulkFormat(j_bulk_format)[source]#
The BulkFormat reads and decodes batches of records at a time. Examples of bulk formats are formats like ORC or Parquet.
Internally in the file source, the readers pass batches of records from the reading threads (that perform the typically blocking I/O operations) to the async mailbox threads that do the streaming and batch data processing. Passing records in batches (rather than one-at-a-time) much reduce the thread-to-thread handover overhead.
For the BulkFormat, one batch is handed over as one.
New in version 1.16.0.