public class ParquetAvroInputFormat extends ParquetInputFormat<org.apache.avro.generic.GenericRecord> implements ResultTypeQueryable<org.apache.avro.generic.GenericRecord>
ParquetInputFormat
to read records from Parquet files and convert
them to Avro GenericRecord. To use it the user needs to add flink-avro
optional
dependency to the classpath. Usage:
final ParquetAvroInputFormat inputFormat = new ParquetAvroInputFormat(new Path(filePath), parquetSchema);
DataSource<GenericRecord> source = env.createInput(inputFormat, new GenericRecordAvroTypeInfo(inputFormat.getAvroSchema()));
FileInputFormat.FileBaseStatistics, FileInputFormat.InputSplitOpenThread
PARQUET_SKIP_CORRUPTED_RECORD, PARQUET_SKIP_WRONG_SCHEMA_SPLITS
currentSplit, ENUMERATE_NESTED_FILES_FLAG, enumerateNestedFiles, filePath, INFLATER_INPUT_STREAM_FACTORIES, minSplitSize, numSplits, openTimeout, READ_WHOLE_SPLIT_FLAG, splitLength, splitStart, stream, unsplittable
Constructor and Description |
---|
ParquetAvroInputFormat(Path filePath,
org.apache.parquet.schema.MessageType messageType) |
Modifier and Type | Method and Description |
---|---|
protected org.apache.avro.generic.GenericRecord |
convert(Row row)
This ParquetInputFormat read parquet record as Row by default.
|
org.apache.avro.Schema |
getAvroSchema() |
GenericRecordAvroTypeInfo |
getProducedType()
Gets the data type (as a
TypeInformation ) produced by this function or input format. |
void |
selectFields(String[] fieldNames)
Configures the fields to be read and returned by the ParquetInputFormat.
|
close, configure, getCurrentState, getFieldNames, getFieldTypes, getPredicate, nextRecord, open, reachedEnd, reopen, setFilterPredicate
acceptFile, createInputSplits, decorateInputStream, extractFileExtension, getFilePath, getFilePaths, getFileStats, getFileStats, getInflaterInputStreamFactory, getInputSplitAssigner, getMinSplitSize, getNestedFileEnumeration, getNumSplits, getOpenTimeout, getSplitLength, getSplitStart, getStatistics, registerInflaterInputStreamFactory, setFilePath, setFilePath, setFilePaths, setFilePaths, setFilesFilter, setMinSplitSize, setNestedFileEnumeration, setNumSplits, setOpenTimeout, supportsMultiPaths, testForUnsplittable, toString
closeInputFormat, getRuntimeContext, openInputFormat, setRuntimeContext
public ParquetAvroInputFormat(Path filePath, org.apache.parquet.schema.MessageType messageType)
public void selectFields(String[] fieldNames)
ParquetInputFormat
selectFields
in class ParquetInputFormat<org.apache.avro.generic.GenericRecord>
fieldNames
- Names of all selected fields.protected org.apache.avro.generic.GenericRecord convert(Row row)
ParquetInputFormat
convert
in class ParquetInputFormat<org.apache.avro.generic.GenericRecord>
row
- row read from parquet filepublic GenericRecordAvroTypeInfo getProducedType()
ResultTypeQueryable
TypeInformation
) produced by this function or input format.getProducedType
in interface ResultTypeQueryable<org.apache.avro.generic.GenericRecord>
public org.apache.avro.Schema getAvroSchema()
Copyright © 2014–2022 The Apache Software Foundation. All rights reserved.