Package org.apache.flink.formats.parquet
Class ParquetColumnarRowInputFormat<SplitT extends FileSourceSplit>
- java.lang.Object
-
- org.apache.flink.formats.parquet.ParquetVectorizedInputFormat<RowData,SplitT>
-
- org.apache.flink.formats.parquet.ParquetColumnarRowInputFormat<SplitT>
-
- All Implemented Interfaces:
Serializable
,ResultTypeQueryable<RowData>
,BulkFormat<RowData,SplitT>
,FileBasedStatisticsReportableInputFormat
public class ParquetColumnarRowInputFormat<SplitT extends FileSourceSplit> extends ParquetVectorizedInputFormat<RowData,SplitT> implements FileBasedStatisticsReportableInputFormat
AParquetVectorizedInputFormat
to provideRowData
iterator. UsingColumnarRowData
to provide a row view of column batch.- See Also:
- Serialized Form
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class org.apache.flink.formats.parquet.ParquetVectorizedInputFormat
ParquetVectorizedInputFormat.ParquetReaderBatch<T>
-
Nested classes/interfaces inherited from interface org.apache.flink.connector.file.src.reader.BulkFormat
BulkFormat.Reader<T>, BulkFormat.RecordIterator<T>
-
-
Field Summary
-
Fields inherited from class org.apache.flink.formats.parquet.ParquetVectorizedInputFormat
hadoopConfig, isUtcTimestamp
-
-
Constructor Summary
Constructors Constructor Description ParquetColumnarRowInputFormat(org.apache.hadoop.conf.Configuration hadoopConfig, RowType projectedType, TypeInformation<RowData> producedTypeInfo, int batchSize, boolean isUtcTimestamp, boolean isCaseSensitive)
Constructor to create parquet format without extra fields.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description static <SplitT extends FileSourceSplit>
ParquetColumnarRowInputFormat<SplitT>createPartitionedFormat(org.apache.hadoop.conf.Configuration hadoopConfig, RowType producedRowType, TypeInformation<RowData> producedTypeInfo, List<String> partitionKeys, PartitionFieldExtractor<SplitT> extractor, int batchSize, boolean isUtcTimestamp, boolean isCaseSensitive)
Create a partitionedParquetColumnarRowInputFormat
, the partition columns can be generated byPath
.protected ParquetVectorizedInputFormat.ParquetReaderBatch<RowData>
createReaderBatch(WritableColumnVector[] writableVectors, VectorizedColumnBatch columnarBatch, Pool.Recycler<ParquetVectorizedInputFormat.ParquetReaderBatch<RowData>> recycler)
TypeInformation<RowData>
getProducedType()
Gets the type produced by this format.protected int
numBatchesToCirculate(Configuration config)
TableStats
reportStatistics(List<Path> files, DataType producedDataType)
Returns the estimated statistics of this input format.-
Methods inherited from class org.apache.flink.formats.parquet.ParquetVectorizedInputFormat
createReader, isSplittable, restoreReader
-
-
-
-
Constructor Detail
-
ParquetColumnarRowInputFormat
public ParquetColumnarRowInputFormat(org.apache.hadoop.conf.Configuration hadoopConfig, RowType projectedType, TypeInformation<RowData> producedTypeInfo, int batchSize, boolean isUtcTimestamp, boolean isCaseSensitive)
Constructor to create parquet format without extra fields.
-
-
Method Detail
-
numBatchesToCirculate
protected int numBatchesToCirculate(Configuration config)
- Overrides:
numBatchesToCirculate
in classParquetVectorizedInputFormat<RowData,SplitT extends FileSourceSplit>
-
createReaderBatch
protected ParquetVectorizedInputFormat.ParquetReaderBatch<RowData> createReaderBatch(WritableColumnVector[] writableVectors, VectorizedColumnBatch columnarBatch, Pool.Recycler<ParquetVectorizedInputFormat.ParquetReaderBatch<RowData>> recycler)
- Specified by:
createReaderBatch
in classParquetVectorizedInputFormat<RowData,SplitT extends FileSourceSplit>
- Parameters:
writableVectors
- vectors to be writecolumnarBatch
- vectors to be readrecycler
- batch recycler
-
getProducedType
public TypeInformation<RowData> getProducedType()
Description copied from interface:BulkFormat
Gets the type produced by this format. This type will be the type produced by the file source as a whole.- Specified by:
getProducedType
in interfaceBulkFormat<RowData,SplitT extends FileSourceSplit>
- Specified by:
getProducedType
in interfaceResultTypeQueryable<SplitT extends FileSourceSplit>
- Returns:
- The data type produced by this function or input format.
-
reportStatistics
public TableStats reportStatistics(List<Path> files, DataType producedDataType)
Description copied from interface:FileBasedStatisticsReportableInputFormat
Returns the estimated statistics of this input format.- Specified by:
reportStatistics
in interfaceFileBasedStatisticsReportableInputFormat
- Parameters:
files
- The files to be estimated.producedDataType
- the final output type of the format.
-
createPartitionedFormat
public static <SplitT extends FileSourceSplit> ParquetColumnarRowInputFormat<SplitT> createPartitionedFormat(org.apache.hadoop.conf.Configuration hadoopConfig, RowType producedRowType, TypeInformation<RowData> producedTypeInfo, List<String> partitionKeys, PartitionFieldExtractor<SplitT> extractor, int batchSize, boolean isUtcTimestamp, boolean isCaseSensitive)
Create a partitionedParquetColumnarRowInputFormat
, the partition columns can be generated byPath
.
-
-