E
- the type of the result Avro record. If you specify GenericRecord
then the
result will be returned as a GenericRecord
, so you do not have to know the schema
ahead of time.public class AvroInputFormat<E> extends FileInputFormat<E> implements ResultTypeQueryable<E>, CheckpointableInputFormat<FileInputSplit,Tuple2<Long,Long>>
FileInputFormat
for Avro records.FileInputFormat.FileBaseStatistics, FileInputFormat.InputSplitOpenThread
currentSplit, ENUMERATE_NESTED_FILES_FLAG, enumerateNestedFiles, filePath, INFLATER_INPUT_STREAM_FACTORIES, minSplitSize, numSplits, openTimeout, READ_WHOLE_SPLIT_FLAG, splitLength, splitStart, stream, unsplittable
Constructor and Description |
---|
AvroInputFormat(Path filePath,
Class<E> type) |
Modifier and Type | Method and Description |
---|---|
Tuple2<Long,Long> |
getCurrentState()
Returns the split currently being read, along with its current state.
|
TypeInformation<E> |
getProducedType()
Gets the data type (as a
TypeInformation ) produced by this function or input format. |
long |
getRecordsReadFromBlock() |
E |
nextRecord(E reuseValue)
Reads the next record from the input.
|
void |
open(FileInputSplit split)
Opens an input stream to the file defined in the input format.
|
boolean |
reachedEnd()
Method used to check if the end of the input is reached.
|
void |
reopen(FileInputSplit split,
Tuple2<Long,Long> state)
Restores the state of a parallel instance reading from an
InputFormat . |
void |
setReuseAvroValue(boolean reuseAvroValue)
Sets the flag whether to reuse the Avro value instance for all records.
|
void |
setUnsplittable(boolean unsplittable)
If set, the InputFormat will only read entire files.
|
boolean |
supportsMultiPaths()
Override this method to supports multiple paths.
|
acceptFile, close, configure, createInputSplits, decorateInputStream, extractFileExtension, getFilePath, getFilePaths, getFileStats, getFileStats, getInflaterInputStreamFactory, getInputSplitAssigner, getMinSplitSize, getNestedFileEnumeration, getNumSplits, getOpenTimeout, getSplitLength, getSplitStart, getStatistics, getSupportedCompressionFormats, registerInflaterInputStreamFactory, setFilePath, setFilePath, setFilePaths, setFilePaths, setFilesFilter, setMinSplitSize, setNestedFileEnumeration, setNumSplits, setOpenTimeout, testForUnsplittable, toString
closeInputFormat, getRuntimeContext, openInputFormat, setRuntimeContext
public void setReuseAvroValue(boolean reuseAvroValue)
reuseAvroValue
- True, if the input format should reuse the Avro value instance, false
otherwise.public void setUnsplittable(boolean unsplittable)
public TypeInformation<E> getProducedType()
ResultTypeQueryable
TypeInformation
) produced by this function or input format.getProducedType
in interface ResultTypeQueryable<E>
public void open(FileInputSplit split) throws IOException
FileInputFormat
The stream is actually opened in an asynchronous thread to make sure any interruptions to the thread working on the input format do not reach the file system.
open
in interface InputFormat<E,FileInputSplit>
open
in class FileInputFormat<E>
split
- The split to be opened.IOException
- Thrown, if the spit could not be opened due to an I/O problem.public boolean reachedEnd() throws IOException
InputFormat
When this method is called, the input format it guaranteed to be opened.
reachedEnd
in interface InputFormat<E,FileInputSplit>
IOException
- Thrown, if an I/O error occurred.public long getRecordsReadFromBlock()
public E nextRecord(E reuseValue) throws IOException
InputFormat
When this method is called, the input format it guaranteed to be opened.
nextRecord
in interface InputFormat<E,FileInputSplit>
reuseValue
- Object that may be reused.IOException
- Thrown, if an I/O error occurred.public boolean supportsMultiPaths()
FileInputFormat
supportsMultiPaths
in class FileInputFormat<E>
public Tuple2<Long,Long> getCurrentState() throws IOException
CheckpointableInputFormat
getCurrentState
in interface CheckpointableInputFormat<FileInputSplit,Tuple2<Long,Long>>
IOException
- Thrown if the creation of the state object failed.public void reopen(FileInputSplit split, Tuple2<Long,Long> state) throws IOException
CheckpointableInputFormat
InputFormat
. This is
necessary when recovering from a task failure. When this method is called, the input format
it guaranteed to be configured.
NOTE: The caller has to make sure that the provided split is the one to whom the state belongs.
reopen
in interface CheckpointableInputFormat<FileInputSplit,Tuple2<Long,Long>>
split
- The split to be opened.state
- The state from which to start from. This can contain the offset, but also other
data, depending on the input format.IOException
Copyright © 2014–2024 The Apache Software Foundation. All rights reserved.