pyflink.datastream.formats.parquet.ParquetColumnarRowInputFormat#
- class ParquetColumnarRowInputFormat(row_type: RowType, hadoop_config: Optional[pyflink.common.configuration.Configuration] = None, batch_size: int = 2048, is_utc_timestamp: bool = False, is_case_sensitive: bool = True)[source]#
A ParquetVectorizedInputFormat to provide RowData iterator. Using ColumnarRowData to provide a row view of column batch. Only primitive types are supported for a column, composite types such as array, map are not supported.
Example:
>>> row_type = DataTypes.ROW([ ... DataTypes.FIELD('a', DataTypes.INT()), ... DataTypes.FIELD('b', DataTypes.STRING()), ... ]) >>> source = FileSource.for_bulk_file_format(ParquetColumnarRowInputFormat( ... row_type=row_type, ... hadoop_config=Configuration(), ... batch_size=2048, ... is_utc_timestamp=False, ... is_case_sensitive=True, ... ), PARQUET_FILE_PATH).build() >>> ds = env.from_source(source, WatermarkStrategy.no_watermarks(), "parquet-source")
New in version 1.16.0.