Package org.apache.flink.orc
Class AbstractOrcFileInputFormat.OrcVectorizedReader<T,BatchT>
- java.lang.Object
-
- org.apache.flink.orc.AbstractOrcFileInputFormat.OrcVectorizedReader<T,BatchT>
-
- Type Parameters:
T
- The type of the records returned by the reader.
- All Implemented Interfaces:
Closeable
,AutoCloseable
,BulkFormat.Reader<T>
- Enclosing class:
- AbstractOrcFileInputFormat<T,BatchT,SplitT extends FileSourceSplit>
protected static final class AbstractOrcFileInputFormat.OrcVectorizedReader<T,BatchT> extends Object implements BulkFormat.Reader<T>
A vectorized ORC reader. This reader reads an ORCAbstractOrcFileInputFormat.OrcVectorizedReader
at a time and converts it to one or more records to be returned. An ORC Row-wise reader would convert the batch into a set of rows, while a reader for a vectorized query processor might return the whole batch as one record.The conversion of the
VectorizedRowBatch
happens in the specificAbstractOrcFileInputFormat.OrcReaderBatch
implementation.The reader tracks its current position using ORC's row numbers. Each record in a batch is addressed by the starting row number of the batch, plus the number of records to be skipped before.
-
-
Constructor Summary
Constructors Modifier Constructor Description protected
OrcVectorizedReader(OrcShim<BatchT> shim, org.apache.orc.RecordReader orcReader, Pool<AbstractOrcFileInputFormat.OrcReaderBatch<T,BatchT>> pool)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
close()
Closes the reader and should release all resources.BulkFormat.RecordIterator<T>
readBatch()
Reads one batch.void
seek(CheckpointedPosition position)
The argument ofRecordReader.seekToRow(long)
must come fromRecordReader.getRowNumber()
.
-
-
-
Method Detail
-
readBatch
@Nullable public BulkFormat.RecordIterator<T> readBatch() throws IOException
Description copied from interface:BulkFormat.Reader
Reads one batch. The method should return null when reaching the end of the input. The returned batch will be handed over to the processing threads as one.The returned iterator object and any contained objects may be held onto by the file source for some time, so it should not be immediately reused by the reader.
To implement reuse and to save object allocation, consider using a
Pool
and recycle objects into the Pool in the theBulkFormat.RecordIterator.releaseBatch()
method.- Specified by:
readBatch
in interfaceBulkFormat.Reader<T>
- Throws:
IOException
-
close
public void close() throws IOException
Description copied from interface:BulkFormat.Reader
Closes the reader and should release all resources.- Specified by:
close
in interfaceAutoCloseable
- Specified by:
close
in interfaceBulkFormat.Reader<T>
- Specified by:
close
in interfaceCloseable
- Throws:
IOException
-
seek
public void seek(CheckpointedPosition position) throws IOException
The argument ofRecordReader.seekToRow(long)
must come fromRecordReader.getRowNumber()
. The internal implementation of ORC is very confusing. It has special behavior when dealing with Predicate.- Throws:
IOException
-
-