Package org.apache.flink.orc
Class OrcColumnarRowInputFormat<BatchT,SplitT extends FileSourceSplit>
- java.lang.Object
-
- org.apache.flink.orc.AbstractOrcFileInputFormat<RowData,BatchT,SplitT>
-
- org.apache.flink.orc.OrcColumnarRowInputFormat<BatchT,SplitT>
-
- All Implemented Interfaces:
Serializable
,ResultTypeQueryable<RowData>
,BulkFormat<RowData,SplitT>
,FileBasedStatisticsReportableInputFormat
public class OrcColumnarRowInputFormat<BatchT,SplitT extends FileSourceSplit> extends AbstractOrcFileInputFormat<RowData,BatchT,SplitT> implements FileBasedStatisticsReportableInputFormat
An ORC reader that produces a stream ofColumnarRowData
records.This class can add extra fields through
ColumnBatchFactory
, for example, add partition fields, which can be extracted from path. Therefore, thegetProducedType()
may be different and types of extra fields need to be added.- See Also:
- Serialized Form
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class org.apache.flink.orc.AbstractOrcFileInputFormat
AbstractOrcFileInputFormat.OrcReaderBatch<T,BatchT>, AbstractOrcFileInputFormat.OrcVectorizedReader<T,BatchT>
-
Nested classes/interfaces inherited from interface org.apache.flink.connector.file.src.reader.BulkFormat
BulkFormat.Reader<T>, BulkFormat.RecordIterator<T>
-
-
Field Summary
-
Fields inherited from class org.apache.flink.orc.AbstractOrcFileInputFormat
batchSize, conjunctPredicates, hadoopConfigWrapper, schema, selectedFields, shim
-
-
Constructor Summary
Constructors Constructor Description OrcColumnarRowInputFormat(OrcShim<BatchT> shim, org.apache.hadoop.conf.Configuration hadoopConfig, org.apache.orc.TypeDescription schema, int[] selectedFields, List<OrcFilters.Predicate> conjunctPredicates, int batchSize, ColumnBatchFactory<BatchT,SplitT> batchFactory, TypeInformation<RowData> producedTypeInfo)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description static <SplitT extends FileSourceSplit>
OrcColumnarRowInputFormat<org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch,SplitT>createPartitionedFormat(OrcShim<org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch> shim, org.apache.hadoop.conf.Configuration hadoopConfig, RowType tableType, List<String> partitionKeys, PartitionFieldExtractor<SplitT> extractor, int[] selectedFields, List<OrcFilters.Predicate> conjunctPredicates, int batchSize, Function<RowType,TypeInformation<RowData>> rowTypeInfoFactory)
Create a partitionedOrcColumnarRowInputFormat
, the partition columns can be generated by split.AbstractOrcFileInputFormat.OrcReaderBatch<RowData,BatchT>
createReaderBatch(SplitT split, OrcVectorizedBatchWrapper<BatchT> orcBatch, Pool.Recycler<AbstractOrcFileInputFormat.OrcReaderBatch<RowData,BatchT>> recycler, int batchSize)
Creates theAbstractOrcFileInputFormat.OrcReaderBatch
structure, which is responsible for holding the data structures that hold the batch data (column vectors, row arrays, ...) and the batch conversion from the ORC representation to the result format.TypeInformation<RowData>
getProducedType()
Gets the type produced by this format.TableStats
reportStatistics(List<Path> files, DataType producedDataType)
Returns the estimated statistics of this input format.-
Methods inherited from class org.apache.flink.orc.AbstractOrcFileInputFormat
createReader, isSplittable, restoreReader
-
-
-
-
Constructor Detail
-
OrcColumnarRowInputFormat
public OrcColumnarRowInputFormat(OrcShim<BatchT> shim, org.apache.hadoop.conf.Configuration hadoopConfig, org.apache.orc.TypeDescription schema, int[] selectedFields, List<OrcFilters.Predicate> conjunctPredicates, int batchSize, ColumnBatchFactory<BatchT,SplitT> batchFactory, TypeInformation<RowData> producedTypeInfo)
-
-
Method Detail
-
createReaderBatch
public AbstractOrcFileInputFormat.OrcReaderBatch<RowData,BatchT> createReaderBatch(SplitT split, OrcVectorizedBatchWrapper<BatchT> orcBatch, Pool.Recycler<AbstractOrcFileInputFormat.OrcReaderBatch<RowData,BatchT>> recycler, int batchSize)
Description copied from class:AbstractOrcFileInputFormat
Creates theAbstractOrcFileInputFormat.OrcReaderBatch
structure, which is responsible for holding the data structures that hold the batch data (column vectors, row arrays, ...) and the batch conversion from the ORC representation to the result format.- Specified by:
createReaderBatch
in classAbstractOrcFileInputFormat<RowData,BatchT,SplitT extends FileSourceSplit>
-
getProducedType
public TypeInformation<RowData> getProducedType()
Description copied from class:AbstractOrcFileInputFormat
Gets the type produced by this format.- Specified by:
getProducedType
in interfaceBulkFormat<BatchT,SplitT extends FileSourceSplit>
- Specified by:
getProducedType
in interfaceResultTypeQueryable<BatchT>
- Specified by:
getProducedType
in classAbstractOrcFileInputFormat<RowData,BatchT,SplitT extends FileSourceSplit>
- Returns:
- The data type produced by this function or input format.
-
reportStatistics
public TableStats reportStatistics(List<Path> files, DataType producedDataType)
Description copied from interface:FileBasedStatisticsReportableInputFormat
Returns the estimated statistics of this input format.- Specified by:
reportStatistics
in interfaceFileBasedStatisticsReportableInputFormat
- Parameters:
files
- The files to be estimated.producedDataType
- the final output type of the format.
-
createPartitionedFormat
public static <SplitT extends FileSourceSplit> OrcColumnarRowInputFormat<org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch,SplitT> createPartitionedFormat(OrcShim<org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch> shim, org.apache.hadoop.conf.Configuration hadoopConfig, RowType tableType, List<String> partitionKeys, PartitionFieldExtractor<SplitT> extractor, int[] selectedFields, List<OrcFilters.Predicate> conjunctPredicates, int batchSize, Function<RowType,TypeInformation<RowData>> rowTypeInfoFactory)
Create a partitionedOrcColumnarRowInputFormat
, the partition columns can be generated by split.
-
-