OrcColumnarRowInputFormat (Flink : 1.19-SNAPSHOT API)

java.lang.Object
- org.apache.flink.orc.AbstractOrcFileInputFormat<RowData,BatchT,SplitT>
- - org.apache.flink.orc.OrcColumnarRowInputFormat<BatchT,SplitT>

All Implemented Interfaces:

Serializable, ResultTypeQueryable<RowData>, BulkFormat<RowData,SplitT>, FileBasedStatisticsReportableInputFormat

Direct Known Subclasses:

OrcColumnarRowFileInputFormat
```
public class OrcColumnarRowInputFormat<BatchT,SplitT extends FileSourceSplit>
extends AbstractOrcFileInputFormat<RowData,BatchT,SplitT>
implements FileBasedStatisticsReportableInputFormat
```
An ORC reader that produces a stream of ColumnarRowData records.
This class can add extra fields through ColumnBatchFactory, for example, add partition fields, which can be extracted from path. Therefore, the getProducedType() may be different and types of extra fields need to be added.

See Also:

Serialized Form

Nested Class Summary
- Nested classes/interfaces inherited from class org.apache.flink.orc.AbstractOrcFileInputFormat
  AbstractOrcFileInputFormat.OrcReaderBatch<T,BatchT>, AbstractOrcFileInputFormat.OrcVectorizedReader<T,BatchT>
- Nested classes/interfaces inherited from interface org.apache.flink.connector.file.src.reader.BulkFormat
  BulkFormat.Reader<T>, BulkFormat.RecordIterator<T>

Field Summary
- Fields inherited from class org.apache.flink.orc.AbstractOrcFileInputFormat
  batchSize, conjunctPredicates, hadoopConfigWrapper, schema, selectedFields, shim

Constructor Summary

Constructors
Constructor and Description
`OrcColumnarRowInputFormat(OrcShim<BatchT> shim, org.apache.hadoop.conf.Configuration hadoopConfig, org.apache.orc.TypeDescription schema, int[] selectedFields, List<OrcFilters.Predicate> conjunctPredicates, int batchSize, ColumnBatchFactory<BatchT,SplitT> batchFactory, TypeInformation<RowData> producedTypeInfo)`

Method Summary

All Methods Static Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`static <SplitT extends FileSourceSplit> OrcColumnarRowInputFormat<org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch,SplitT>`	`createPartitionedFormat(OrcShim<org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch> shim, org.apache.hadoop.conf.Configuration hadoopConfig, RowType tableType, List<String> partitionKeys, PartitionFieldExtractor<SplitT> extractor, int[] selectedFields, List<OrcFilters.Predicate> conjunctPredicates, int batchSize, java.util.function.Function<RowType,TypeInformation<RowData>> rowTypeInfoFactory)` Create a partitioned `OrcColumnarRowInputFormat`, the partition columns can be generated by split.
`AbstractOrcFileInputFormat.OrcReaderBatch<RowData,BatchT>`	`createReaderBatch(SplitT split, OrcVectorizedBatchWrapper<BatchT> orcBatch, Pool.Recycler<AbstractOrcFileInputFormat.OrcReaderBatch<RowData,BatchT>> recycler, int batchSize)` Creates the `AbstractOrcFileInputFormat.OrcReaderBatch` structure, which is responsible for holding the data structures that hold the batch data (column vectors, row arrays, ...) and the batch conversion from the ORC representation to the result format.
`TypeInformation<RowData>`	`getProducedType()` Gets the type produced by this format.
`TableStats`	`reportStatistics(List<Path> files, DataType producedDataType)` Returns the estimated statistics of this input format.

Methods inherited from class org.apache.flink.orc.AbstractOrcFileInputFormat
createReader, isSplittable, restoreReader

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Detail

OrcColumnarRowInputFormat

public OrcColumnarRowInputFormat(OrcShim<BatchT> shim,
                                 org.apache.hadoop.conf.Configuration hadoopConfig,
                                 org.apache.orc.TypeDescription schema,
                                 int[] selectedFields,
                                 List<OrcFilters.Predicate> conjunctPredicates,
                                 int batchSize,
                                 ColumnBatchFactory<BatchT,SplitT> batchFactory,
                                 TypeInformation<RowData> producedTypeInfo)

Method Detail

createReaderBatch

public AbstractOrcFileInputFormat.OrcReaderBatch<RowData,BatchT> createReaderBatch(SplitT split,
                                                                                   OrcVectorizedBatchWrapper<BatchT> orcBatch,
                                                                                   Pool.Recycler<AbstractOrcFileInputFormat.OrcReaderBatch<RowData,BatchT>> recycler,
                                                                                   int batchSize)

Description copied from class: AbstractOrcFileInputFormat

Creates the AbstractOrcFileInputFormat.OrcReaderBatch structure, which is responsible for holding the data structures that hold the batch data (column vectors, row arrays, ...) and the batch conversion from the ORC representation to the result format.

Specified by:: createReaderBatch in class AbstractOrcFileInputFormat<RowData,BatchT,SplitT extends FileSourceSplit>

getProducedType
```
public TypeInformation<RowData> getProducedType()
```
Description copied from class: AbstractOrcFileInputFormat

Gets the type produced by this format.

Specified by:

getProducedType in interface ResultTypeQueryable<RowData>

Specified by:

getProducedType in interface BulkFormat<RowData,SplitT extends FileSourceSplit>

Specified by:

getProducedType in class AbstractOrcFileInputFormat<RowData,BatchT,SplitT extends FileSourceSplit>

Returns:

The data type produced by this function or input format.

reportStatistics
```
public TableStats reportStatistics(List<Path> files,
                                   DataType producedDataType)
```
Description copied from interface: FileBasedStatisticsReportableInputFormat

Returns the estimated statistics of this input format.

Specified by:

reportStatistics in interface FileBasedStatisticsReportableInputFormat

Parameters:

files - The files to be estimated.

producedDataType - the final output type of the format.

createPartitionedFormat

public static <SplitT extends FileSourceSplit> OrcColumnarRowInputFormat<org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch,SplitT> createPartitionedFormat(OrcShim<org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch> shim,
                                                                                                                                                                  org.apache.hadoop.conf.Configuration hadoopConfig,
                                                                                                                                                                  RowType tableType,
                                                                                                                                                                  List<String> partitionKeys,
                                                                                                                                                                  PartitionFieldExtractor<SplitT> extractor,
                                                                                                                                                                  int[] selectedFields,
                                                                                                                                                                  List<OrcFilters.Predicate> conjunctPredicates,
                                                                                                                                                                  int batchSize,
                                                                                                                                                                  java.util.function.Function<RowType,TypeInformation<RowData>> rowTypeInfoFactory)

Create a partitioned OrcColumnarRowInputFormat, the partition columns can be generated by split.

Back to Flink Website

Class OrcColumnarRowInputFormat<BatchT,SplitT extends FileSourceSplit>

Nested Class Summary

Nested classes/interfaces inherited from class org.apache.flink.orc.AbstractOrcFileInputFormat

Nested classes/interfaces inherited from interface org.apache.flink.connector.file.src.reader.BulkFormat

Field Summary

Fields inherited from class org.apache.flink.orc.AbstractOrcFileInputFormat

Constructor Summary

Method Summary

Methods inherited from class org.apache.flink.orc.AbstractOrcFileInputFormat

Methods inherited from class java.lang.Object