pyflink.datastream.formats.parquet.ParquetBulkWriters#

class ParquetBulkWriters[source]#

Convenient builder to create a BulkWriterFactory that writes records with a predefined schema into Parquet files in a batch fashion.

Example:

>>> row_type = DataTypes.ROW([
...     DataTypes.FIELD('string', DataTypes.STRING()),
...     DataTypes.FIELD('int_array', DataTypes.ARRAY(DataTypes.INT()))
... ])
>>> sink = FileSink.for_bulk_format(
...     OUTPUT_DIR, ParquetBulkWriters.for_row_type(
...         row_type,
...         hadoop_config=Configuration(),
...         utc_timestamp=True,
...     )
... ).build()
>>> ds.sink_to(sink)

New in version 1.16.0.

Methods

for_row_type(row_type[, hadoop_config, ...])

Create a BulkWriterFactory that writes records with a predefined schema into Parquet files in a batch fashion.