pyflink.datastream.formats.parquet.ParquetBulkWriters#
- class ParquetBulkWriters[source]#
Convenient builder to create a
BulkWriterFactory
that writes records with a predefined schema into Parquet files in a batch fashion.Example:
>>> row_type = DataTypes.ROW([ ... DataTypes.FIELD('string', DataTypes.STRING()), ... DataTypes.FIELD('int_array', DataTypes.ARRAY(DataTypes.INT())) ... ]) >>> sink = FileSink.for_bulk_format( ... OUTPUT_DIR, ParquetBulkWriters.for_row_type( ... row_type, ... hadoop_config=Configuration(), ... utc_timestamp=True, ... ) ... ).build() >>> ds.sink_to(sink)
New in version 1.16.0.
Methods
for_row_type
(row_type[, hadoop_config, ...])Create a
BulkWriterFactory
that writes records with a predefined schema into Parquet files in a batch fashion.