public class OrcColumnarRowFileInputFormat<BatchT,SplitT extends FileSourceSplit> extends AbstractOrcFileInputFormat<RowData,BatchT,SplitT>
ColumnarRowData
records.
This class can add extra fields through ColumnBatchFactory
, for example, add partition
fields, which can be extracted from path. Therefore, the getProducedType()
may be
different and types of extra fields need to be added.
AbstractOrcFileInputFormat.OrcReaderBatch<T,BatchT>, AbstractOrcFileInputFormat.OrcVectorizedReader<T,BatchT>
BulkFormat.Reader<T>, BulkFormat.RecordIterator<T>
batchSize, conjunctPredicates, hadoopConfigWrapper, schema, selectedFields, shim
Constructor and Description |
---|
OrcColumnarRowFileInputFormat(OrcShim<BatchT> shim,
Configuration hadoopConfig,
org.apache.orc.TypeDescription schema,
int[] selectedFields,
List<OrcFilters.Predicate> conjunctPredicates,
int batchSize,
ColumnBatchFactory<BatchT,SplitT> batchFactory,
RowType projectedOutputType) |
Modifier and Type | Method and Description |
---|---|
static <SplitT extends FileSourceSplit> |
createPartitionedFormat(OrcShim<org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch> shim,
Configuration hadoopConfig,
RowType tableType,
List<String> partitionKeys,
PartitionFieldExtractor<SplitT> extractor,
int[] selectedFields,
List<OrcFilters.Predicate> conjunctPredicates,
int batchSize)
Create a partitioned
OrcColumnarRowFileInputFormat , the partition columns can be
generated by split. |
AbstractOrcFileInputFormat.OrcReaderBatch<RowData,BatchT> |
createReaderBatch(SplitT split,
OrcVectorizedBatchWrapper<BatchT> orcBatch,
Pool.Recycler<AbstractOrcFileInputFormat.OrcReaderBatch<RowData,BatchT>> recycler,
int batchSize)
Creates the
AbstractOrcFileInputFormat.OrcReaderBatch structure, which is responsible for holding the data
structures that hold the batch data (column vectors, row arrays, ...) and the batch
conversion from the ORC representation to the result format. |
TypeInformation<RowData> |
getProducedType()
Gets the type produced by this format.
|
createReader, isSplittable, restoreReader
public OrcColumnarRowFileInputFormat(OrcShim<BatchT> shim, Configuration hadoopConfig, org.apache.orc.TypeDescription schema, int[] selectedFields, List<OrcFilters.Predicate> conjunctPredicates, int batchSize, ColumnBatchFactory<BatchT,SplitT> batchFactory, RowType projectedOutputType)
public AbstractOrcFileInputFormat.OrcReaderBatch<RowData,BatchT> createReaderBatch(SplitT split, OrcVectorizedBatchWrapper<BatchT> orcBatch, Pool.Recycler<AbstractOrcFileInputFormat.OrcReaderBatch<RowData,BatchT>> recycler, int batchSize)
AbstractOrcFileInputFormat
AbstractOrcFileInputFormat.OrcReaderBatch
structure, which is responsible for holding the data
structures that hold the batch data (column vectors, row arrays, ...) and the batch
conversion from the ORC representation to the result format.createReaderBatch
in class AbstractOrcFileInputFormat<RowData,BatchT,SplitT extends FileSourceSplit>
public TypeInformation<RowData> getProducedType()
AbstractOrcFileInputFormat
getProducedType
in interface ResultTypeQueryable<RowData>
getProducedType
in interface BulkFormat<RowData,SplitT extends FileSourceSplit>
getProducedType
in class AbstractOrcFileInputFormat<RowData,BatchT,SplitT extends FileSourceSplit>
public static <SplitT extends FileSourceSplit> OrcColumnarRowFileInputFormat<org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch,SplitT> createPartitionedFormat(OrcShim<org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch> shim, Configuration hadoopConfig, RowType tableType, List<String> partitionKeys, PartitionFieldExtractor<SplitT> extractor, int[] selectedFields, List<OrcFilters.Predicate> conjunctPredicates, int batchSize)
OrcColumnarRowFileInputFormat
, the partition columns can be
generated by split.Copyright © 2014–2023 The Apache Software Foundation. All rights reserved.