public abstract class ParquetVectorizedInputFormat<T,SplitT extends FileSourceSplit> extends Object implements BulkFormat<T,SplitT>
BulkFormat
that reads data from the file to VectorizedColumnBatch
in
vectorized mode.Modifier and Type | Class and Description |
---|---|
protected static class |
ParquetVectorizedInputFormat.ParquetReaderBatch<T>
Reader batch that provides writing and reading capabilities.
|
BulkFormat.Reader<T>, BulkFormat.RecordIterator<T>
Modifier and Type | Field and Description |
---|---|
protected SerializableConfiguration |
hadoopConfig |
protected boolean |
isUtcTimestamp |
Constructor and Description |
---|
ParquetVectorizedInputFormat(SerializableConfiguration hadoopConfig,
RowType projectedType,
ColumnBatchFactory<SplitT> batchFactory,
int batchSize,
boolean isUtcTimestamp,
boolean isCaseSensitive) |
Modifier and Type | Method and Description |
---|---|
org.apache.flink.formats.parquet.ParquetVectorizedInputFormat.ParquetReader |
createReader(Configuration config,
SplitT split)
Creates a new reader that reads from the
split's path starting
at the FileSourceSplit.offset() split's offset} and reads length bytes after the offset. |
protected abstract ParquetVectorizedInputFormat.ParquetReaderBatch<T> |
createReaderBatch(WritableColumnVector[] writableVectors,
VectorizedColumnBatch columnarBatch,
Pool.Recycler<ParquetVectorizedInputFormat.ParquetReaderBatch<T>> recycler) |
boolean |
isSplittable()
Checks whether this format is splittable.
|
protected int |
numBatchesToCirculate(Configuration config) |
org.apache.flink.formats.parquet.ParquetVectorizedInputFormat.ParquetReader |
restoreReader(Configuration config,
SplitT split)
Creates a new reader that reads from
split.path() starting at offset and
reads until length bytes after the offset. |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
getProducedType
protected final SerializableConfiguration hadoopConfig
protected final boolean isUtcTimestamp
public ParquetVectorizedInputFormat(SerializableConfiguration hadoopConfig, RowType projectedType, ColumnBatchFactory<SplitT> batchFactory, int batchSize, boolean isUtcTimestamp, boolean isCaseSensitive)
public org.apache.flink.formats.parquet.ParquetVectorizedInputFormat.ParquetReader createReader(Configuration config, SplitT split) throws IOException
BulkFormat
split's path
starting
at the FileSourceSplit.offset()
split's offset} and reads length
bytes after the offset.createReader
in interface BulkFormat<T,SplitT extends FileSourceSplit>
IOException
protected int numBatchesToCirculate(Configuration config)
public org.apache.flink.formats.parquet.ParquetVectorizedInputFormat.ParquetReader restoreReader(Configuration config, SplitT split) throws IOException
BulkFormat
split.path()
starting at offset
and
reads until length
bytes after the offset. A number of recordsToSkip
records
should be read and discarded after the offset. This is typically part of restoring a reader
to a checkpointed position.restoreReader
in interface BulkFormat<T,SplitT extends FileSourceSplit>
IOException
public boolean isSplittable()
BulkFormat
See top-level JavaDocs
(section "Splitting") for details.
isSplittable
in interface BulkFormat<T,SplitT extends FileSourceSplit>
protected abstract ParquetVectorizedInputFormat.ParquetReaderBatch<T> createReaderBatch(WritableColumnVector[] writableVectors, VectorizedColumnBatch columnarBatch, Pool.Recycler<ParquetVectorizedInputFormat.ParquetReaderBatch<T>> recycler)
writableVectors
- vectors to be writecolumnarBatch
- vectors to be readrecycler
- batch recyclerCopyright © 2014–2024 The Apache Software Foundation. All rights reserved.