java.lang.Object
- org.apache.flink.orc.AbstractOrcFileInputFormat<T,BatchT,SplitT>

Type Parameters:

T - The type of records produced by this reader format.

All Implemented Interfaces:

Serializable, ResultTypeQueryable<T>, BulkFormat<T,SplitT>

Direct Known Subclasses:

OrcColumnarRowInputFormat
```
public abstract class AbstractOrcFileInputFormat<T,BatchT,SplitT extends FileSourceSplit>
extends Object
implements BulkFormat<T,SplitT>
```
The base for ORC readers for the FileSource. Implements the reader initialization, vectorized reading, and pooling of column vector objects.
Subclasses implement the conversion to the specific result record(s) that they return by creating via extending AbstractOrcFileInputFormat.OrcReaderBatch.

See Also:

Serialized Form

Nested Class Summary

Nested Classes
Modifier and Type	Class	Description
`protected static class`	`AbstractOrcFileInputFormat.OrcReaderBatch<T,BatchT>`	The `OrcReaderBatch` class holds the data structures containing the batch data (column vectors, row arrays, ...) and performs the batch conversion from the ORC representation to the result format.
`protected static class`	`AbstractOrcFileInputFormat.OrcVectorizedReader<T,BatchT>`	A vectorized ORC reader.

Nested classes/interfaces inherited from interface org.apache.flink.connector.file.src.reader.BulkFormat
BulkFormat.Reader<T>, BulkFormat.RecordIterator<T>

Field Summary

Fields
Modifier and Type	Field	Description
`protected int`	`batchSize`
`protected List<OrcFilters.Predicate>`	`conjunctPredicates`
`protected SerializableHadoopConfigWrapper`	`hadoopConfigWrapper`
`protected org.apache.orc.TypeDescription`	`schema`
`protected int[]`	`selectedFields`
`protected OrcShim<BatchT>`	`shim`

Constructor Summary

Constructors
Modifier	Constructor	Description
`protected`	`AbstractOrcFileInputFormat(OrcShim<BatchT> shim, org.apache.hadoop.conf.Configuration hadoopConfig, org.apache.orc.TypeDescription schema, int[] selectedFields, List<OrcFilters.Predicate> conjunctPredicates, int batchSize)`

Method Summary

All Methods Instance Methods Abstract Methods Concrete Methods
Modifier and Type	Method	Description
`AbstractOrcFileInputFormat.OrcVectorizedReader<T,BatchT>`	`createReader(Configuration config, SplitT split)`	Creates a new reader that reads from the `split's path` starting at the `split's offset` and reads `length` bytes after the offset.
`abstract AbstractOrcFileInputFormat.OrcReaderBatch<T,BatchT>`	`createReaderBatch(SplitT split, OrcVectorizedBatchWrapper<BatchT> orcBatch, Pool.Recycler<AbstractOrcFileInputFormat.OrcReaderBatch<T,BatchT>> recycler, int batchSize)`	Creates the `AbstractOrcFileInputFormat.OrcReaderBatch` structure, which is responsible for holding the data structures that hold the batch data (column vectors, row arrays, ...) and the batch conversion from the ORC representation to the result format.
`abstract TypeInformation<T>`	`getProducedType()`	Gets the type produced by this format.
`boolean`	`isSplittable()`	Checks whether this format is splittable.
`AbstractOrcFileInputFormat.OrcVectorizedReader<T,BatchT>`	`restoreReader(Configuration config, SplitT split)`	Creates a new reader that reads from `split.path()` starting at `offset` and reads until `length` bytes after the offset.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Detail

shim
```
protected final OrcShim<BatchT> shim
```

hadoopConfigWrapper

protected final SerializableHadoopConfigWrapper hadoopConfigWrapper

schema

protected final org.apache.orc.TypeDescription schema

selectedFields
```
protected final int[] selectedFields
```

conjunctPredicates

protected final List<OrcFilters.Predicate> conjunctPredicates

batchSize
```
protected final int batchSize
```

Constructor Detail

AbstractOrcFileInputFormat

protected AbstractOrcFileInputFormat(OrcShim<BatchT> shim,
                                     org.apache.hadoop.conf.Configuration hadoopConfig,
                                     org.apache.orc.TypeDescription schema,
                                     int[] selectedFields,
                                     List<OrcFilters.Predicate> conjunctPredicates,
                                     int batchSize)

Parameters:: shim - the shim for various Orc dependent versions. If you use the latest version, please use OrcShim.defaultShim() directly.; hadoopConfig - the hadoop config for orc reader.; schema - the full schema of orc format.; selectedFields - the read selected field of orc format.; conjunctPredicates - the filter predicates that can be evaluated.; batchSize - the batch size of orc reader.

Method Detail

createReader

public AbstractOrcFileInputFormat.OrcVectorizedReader<T,BatchT> createReader(Configuration config,
                                                                                   SplitT split)
                                                                            throws IOException

Description copied from interface: BulkFormat

Creates a new reader that reads from the split's path starting at the split's offset and reads length bytes after the offset.

Specified by:: createReader in interface BulkFormat<T,BatchT>
Throws:: IOException

restoreReader
```
public AbstractOrcFileInputFormat.OrcVectorizedReader<T,BatchT> restoreReader(Configuration config,
                                                                                    SplitT split)
                                                                             throws IOException
```
Description copied from interface: BulkFormat

Creates a new reader that reads from split.path() starting at offset and reads until length bytes after the offset. A number of recordsToSkip records should be read and discarded after the offset. This is typically part of restoring a reader to a checkpointed position.

Specified by:

restoreReader in interface BulkFormat<T,BatchT>

Throws:

IOException

isSplittable
```
public boolean isSplittable()
```
Description copied from interface: BulkFormat

Checks whether this format is splittable. Splittable formats allow Flink to create multiple splits per file, so that Flink can read multiple regions of the file concurrently.
See top-level JavaDocs (section "Splitting") for details.

Specified by:

isSplittable in interface BulkFormat<T,BatchT>

createReaderBatch

public abstract AbstractOrcFileInputFormat.OrcReaderBatch<T,BatchT> createReaderBatch(SplitT split,
                                                                                            OrcVectorizedBatchWrapper<BatchT> orcBatch,
                                                                                            Pool.Recycler<AbstractOrcFileInputFormat.OrcReaderBatch<T,BatchT>> recycler,
                                                                                            int batchSize)

Creates the AbstractOrcFileInputFormat.OrcReaderBatch structure, which is responsible for holding the data structures that hold the batch data (column vectors, row arrays, ...) and the batch conversion from the ORC representation to the result format.

getProducedType
```
public abstract TypeInformation<T> getProducedType()
```
Gets the type produced by this format.

Specified by:

getProducedType in interface BulkFormat<T,BatchT>

Specified by:

getProducedType in interface ResultTypeQueryable<T>

Returns:

The data type produced by this function or input format.

Class AbstractOrcFileInputFormat<T,​BatchT,​SplitT extends FileSourceSplit>

Nested Class Summary

Nested classes/interfaces inherited from interface org.apache.flink.connector.file.src.reader.BulkFormat

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Detail

shim

hadoopConfigWrapper

schema

selectedFields

conjunctPredicates

batchSize

Constructor Detail

AbstractOrcFileInputFormat

Method Detail

createReader

restoreReader

isSplittable

createReaderBatch

getProducedType

Class AbstractOrcFileInputFormat<T,BatchT,SplitT extends FileSourceSplit>