ParquetVectorizedInputFormat (Flink : 1.16-SNAPSHOT API)

java.lang.Object
- org.apache.flink.formats.parquet.ParquetVectorizedInputFormat<T,SplitT>

All Implemented Interfaces:

Serializable, ResultTypeQueryable<T>, BulkFormat<T,SplitT>

Direct Known Subclasses:

ParquetColumnarRowInputFormat
```
public abstract class ParquetVectorizedInputFormat<T,SplitT extends FileSourceSplit>
extends Object
implements BulkFormat<T,SplitT>
```
Parquet BulkFormat that reads data from the file to VectorizedColumnBatch in vectorized mode.

See Also:

Serialized Form

Nested Class Summary

Nested Classes
Modifier and Type Class and Description

protected static class ParquetVectorizedInputFormat.ParquetReaderBatch<T>
Reader batch that provides writing and reading capabilities.
- Nested classes/interfaces inherited from interface org.apache.flink.connector.file.src.reader.BulkFormat
  BulkFormat.Reader<T>, BulkFormat.RecordIterator<T>

Nested Classes
Modifier and Type	Class and Description
`protected static class`	`ParquetVectorizedInputFormat.ParquetReaderBatch<T>` Reader batch that provides writing and reading capabilities.

Field Summary

Fields
Modifier and Type Field and Description

protected SerializableConfiguration hadoopConfig

protected boolean isUtcTimestamp

Fields
Modifier and Type	Field and Description
`protected SerializableConfiguration`	`hadoopConfig`
`protected boolean`	`isUtcTimestamp`

Constructor Summary

Constructors
Constructor and Description
`ParquetVectorizedInputFormat(SerializableConfiguration hadoopConfig, RowType projectedType, ColumnBatchFactory<SplitT> batchFactory, int batchSize, boolean isUtcTimestamp, boolean isCaseSensitive)`

Method Summary

All Methods Instance Methods Abstract Methods Concrete Methods
Modifier and Type	Method and Description
`org.apache.flink.formats.parquet.ParquetVectorizedInputFormat.ParquetReader`	`createReader(Configuration config, SplitT split)` Creates a new reader that reads from the `split's path` starting at the `FileSourceSplit.offset()` split's offset} and reads `length` bytes after the offset.
`protected abstract ParquetVectorizedInputFormat.ParquetReaderBatch<T>`	`createReaderBatch(WritableColumnVector[] writableVectors, VectorizedColumnBatch columnarBatch, Pool.Recycler<ParquetVectorizedInputFormat.ParquetReaderBatch<T>> recycler)`
`boolean`	`isSplittable()` Checks whether this format is splittable.
`protected int`	`numBatchesToCirculate(Configuration config)`
`org.apache.flink.formats.parquet.ParquetVectorizedInputFormat.ParquetReader`	`restoreReader(Configuration config, SplitT split)` Creates a new reader that reads from `split.path()` starting at `offset` and reads until `length` bytes after the offset.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface org.apache.flink.connector.file.src.reader.BulkFormat
getProducedType

Field Detail

hadoopConfig

protected final SerializableConfiguration hadoopConfig

isUtcTimestamp
```
protected final boolean isUtcTimestamp
```

Constructor Detail

ParquetVectorizedInputFormat

public ParquetVectorizedInputFormat(SerializableConfiguration hadoopConfig,
                                    RowType projectedType,
                                    ColumnBatchFactory<SplitT> batchFactory,
                                    int batchSize,
                                    boolean isUtcTimestamp,
                                    boolean isCaseSensitive)

Method Detail

createReader

public org.apache.flink.formats.parquet.ParquetVectorizedInputFormat.ParquetReader createReader(Configuration config,
                                                                                                SplitT split)
                                                                                         throws IOException

Description copied from interface: BulkFormat

Creates a new reader that reads from the split's path starting at the FileSourceSplit.offset() split's offset} and reads length bytes after the offset.

Specified by:: createReader in interface BulkFormat<T,SplitT extends FileSourceSplit>
Throws:: IOException

numBatchesToCirculate

protected int numBatchesToCirculate(Configuration config)

restoreReader

public org.apache.flink.formats.parquet.ParquetVectorizedInputFormat.ParquetReader restoreReader(Configuration config,
                                                                                                 SplitT split)
                                                                                          throws IOException

Description copied from interface: BulkFormat

Creates a new reader that reads from split.path() starting at offset and reads until length bytes after the offset. A number of recordsToSkip records should be read and discarded after the offset. This is typically part of restoring a reader to a checkpointed position.

Specified by:: restoreReader in interface BulkFormat<T,SplitT extends FileSourceSplit>
Throws:: IOException

isSplittable
```
public boolean isSplittable()
```
Description copied from interface: BulkFormat

Checks whether this format is splittable. Splittable formats allow Flink to create multiple splits per file, so that Flink can read multiple regions of the file concurrently.
See top-level JavaDocs (section "Splitting") for details.

Specified by:

isSplittable in interface BulkFormat<T,SplitT extends FileSourceSplit>

createReaderBatch

protected abstract ParquetVectorizedInputFormat.ParquetReaderBatch<T> createReaderBatch(WritableColumnVector[] writableVectors,
                                                                                        VectorizedColumnBatch columnarBatch,
                                                                                        Pool.Recycler<ParquetVectorizedInputFormat.ParquetReaderBatch<T>> recycler)

Parameters:: writableVectors - vectors to be write; columnarBatch - vectors to be read; recycler - batch recycler

Back to Flink Website

Class ParquetVectorizedInputFormat<T,SplitT extends FileSourceSplit>

Nested Class Summary

Nested classes/interfaces inherited from interface org.apache.flink.connector.file.src.reader.BulkFormat

Field Summary