public class ParquetPojoInputFormat<E> extends ParquetInputFormat<E>
ParquetInputFormat
to read POJO records from Parquet files.FileInputFormat.FileBaseStatistics, FileInputFormat.InputSplitOpenThread
PARQUET_SKIP_CORRUPTED_RECORD, PARQUET_SKIP_WRONG_SCHEMA_SPLITS
currentSplit, ENUMERATE_NESTED_FILES_FLAG, enumerateNestedFiles, filePath, INFLATER_INPUT_STREAM_FACTORIES, minSplitSize, numSplits, openTimeout, READ_WHOLE_SPLIT_FLAG, splitLength, splitStart, stream, unsplittable
Constructor and Description |
---|
ParquetPojoInputFormat(Path filePath,
org.apache.parquet.schema.MessageType messageType,
PojoTypeInfo<E> pojoTypeInfo) |
Modifier and Type | Method and Description |
---|---|
protected E |
convert(Row row)
This ParquetInputFormat read parquet record as Row by default.
|
void |
open(FileInputSplit split)
Opens an input stream to the file defined in the input format.
|
close, configure, getCurrentState, getFieldNames, getFieldTypes, getPredicate, nextRecord, reachedEnd, reopen, selectFields, setFilterPredicate
acceptFile, createInputSplits, decorateInputStream, extractFileExtension, getFilePath, getFilePaths, getFileStats, getFileStats, getInflaterInputStreamFactory, getInputSplitAssigner, getMinSplitSize, getNestedFileEnumeration, getNumSplits, getOpenTimeout, getSplitLength, getSplitStart, getStatistics, registerInflaterInputStreamFactory, setFilePath, setFilePath, setFilePaths, setFilePaths, setFilesFilter, setMinSplitSize, setNestedFileEnumeration, setNumSplits, setOpenTimeout, supportsMultiPaths, testForUnsplittable, toString
closeInputFormat, getRuntimeContext, openInputFormat, setRuntimeContext
public ParquetPojoInputFormat(Path filePath, org.apache.parquet.schema.MessageType messageType, PojoTypeInfo<E> pojoTypeInfo)
public void open(FileInputSplit split) throws IOException
FileInputFormat
The stream is actually opened in an asynchronous thread to make sure any interruptions to the thread working on the input format do not reach the file system.
open
in interface InputFormat<E,FileInputSplit>
open
in class ParquetInputFormat<E>
split
- The split to be opened.IOException
- Thrown, if the spit could not be opened due to an I/O problem.protected E convert(Row row)
ParquetInputFormat
convert
in class ParquetInputFormat<E>
row
- row read from parquet fileCopyright © 2014–2021 The Apache Software Foundation. All rights reserved.