Package org.apache.flink.formats.avro
Class AvroInputFormat<E>
- java.lang.Object
-
- org.apache.flink.api.common.io.RichInputFormat<OT,FileInputSplit>
-
- org.apache.flink.api.common.io.FileInputFormat<E>
-
- org.apache.flink.formats.avro.AvroInputFormat<E>
-
- Type Parameters:
E
- the type of the result Avro record. If you specifyGenericRecord
then the result will be returned as aGenericRecord
, so you do not have to know the schema ahead of time.
- All Implemented Interfaces:
Serializable
,CheckpointableInputFormat<FileInputSplit,Tuple2<Long,Long>>
,InputFormat<E,FileInputSplit>
,ResultTypeQueryable<E>
,InputSplitSource<FileInputSplit>
public class AvroInputFormat<E> extends FileInputFormat<E> implements ResultTypeQueryable<E>, CheckpointableInputFormat<FileInputSplit,Tuple2<Long,Long>>
Provides aFileInputFormat
for Avro records.- See Also:
- Serialized Form
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class org.apache.flink.api.common.io.FileInputFormat
FileInputFormat.FileBaseStatistics, FileInputFormat.InputSplitOpenThread
-
-
Field Summary
-
Fields inherited from class org.apache.flink.api.common.io.FileInputFormat
currentSplit, enumerateNestedFiles, INFLATER_INPUT_STREAM_FACTORIES, minSplitSize, numSplits, openTimeout, READ_WHOLE_SPLIT_FLAG, splitLength, splitStart, stream, unsplittable
-
-
Constructor Summary
Constructors Constructor Description AvroInputFormat(Path filePath, Class<E> type)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description Tuple2<Long,Long>
getCurrentState()
Returns the split currently being read, along with its current state.TypeInformation<E>
getProducedType()
Gets the data type (as aTypeInformation
) produced by this function or input format.long
getRecordsReadFromBlock()
E
nextRecord(E reuseValue)
Reads the next record from the input.void
open(FileInputSplit split)
Opens an input stream to the file defined in the input format.boolean
reachedEnd()
Method used to check if the end of the input is reached.void
reopen(FileInputSplit split, Tuple2<Long,Long> state)
Restores the state of a parallel instance reading from anInputFormat
.void
setReuseAvroValue(boolean reuseAvroValue)
Sets the flag whether to reuse the Avro value instance for all records.void
setUnsplittable(boolean unsplittable)
If set, the InputFormat will only read entire files.-
Methods inherited from class org.apache.flink.api.common.io.FileInputFormat
acceptFile, close, configure, createInputSplits, decorateInputStream, extractFileExtension, getFilePaths, getFileStats, getFileStats, getInflaterInputStreamFactory, getInputSplitAssigner, getMinSplitSize, getNestedFileEnumeration, getNumSplits, getOpenTimeout, getSplitLength, getSplitStart, getStatistics, getSupportedCompressionFormats, registerInflaterInputStreamFactory, setFilePath, setFilePath, setFilePaths, setFilePaths, setFilesFilter, setMinSplitSize, setNestedFileEnumeration, setNumSplits, setOpenTimeout, testForUnsplittable, toString
-
Methods inherited from class org.apache.flink.api.common.io.RichInputFormat
closeInputFormat, getRuntimeContext, openInputFormat, setRuntimeContext
-
-
-
-
Method Detail
-
setReuseAvroValue
public void setReuseAvroValue(boolean reuseAvroValue)
Sets the flag whether to reuse the Avro value instance for all records. By default, the input format reuses the Avro value.- Parameters:
reuseAvroValue
- True, if the input format should reuse the Avro value instance, false otherwise.
-
setUnsplittable
public void setUnsplittable(boolean unsplittable)
If set, the InputFormat will only read entire files.
-
getProducedType
public TypeInformation<E> getProducedType()
Description copied from interface:ResultTypeQueryable
Gets the data type (as aTypeInformation
) produced by this function or input format.- Specified by:
getProducedType
in interfaceResultTypeQueryable<E>
- Returns:
- The data type produced by this function or input format.
-
open
public void open(FileInputSplit split) throws IOException
Description copied from class:FileInputFormat
Opens an input stream to the file defined in the input format. The stream is positioned at the beginning of the given split.The stream is actually opened in an asynchronous thread to make sure any interruptions to the thread working on the input format do not reach the file system.
- Specified by:
open
in interfaceInputFormat<E,FileInputSplit>
- Overrides:
open
in classFileInputFormat<E>
- Parameters:
split
- The split to be opened.- Throws:
IOException
- Thrown, if the spit could not be opened due to an I/O problem.
-
reachedEnd
public boolean reachedEnd() throws IOException
Description copied from interface:InputFormat
Method used to check if the end of the input is reached.When this method is called, the input format it guaranteed to be opened.
- Specified by:
reachedEnd
in interfaceInputFormat<E,FileInputSplit>
- Returns:
- True if the end is reached, otherwise false.
- Throws:
IOException
- Thrown, if an I/O error occurred.
-
getRecordsReadFromBlock
public long getRecordsReadFromBlock()
-
nextRecord
public E nextRecord(E reuseValue) throws IOException
Description copied from interface:InputFormat
Reads the next record from the input.When this method is called, the input format it guaranteed to be opened.
- Specified by:
nextRecord
in interfaceInputFormat<E,FileInputSplit>
- Parameters:
reuseValue
- Object that may be reused.- Returns:
- Read record.
- Throws:
IOException
- Thrown, if an I/O error occurred.
-
getCurrentState
public Tuple2<Long,Long> getCurrentState() throws IOException
Description copied from interface:CheckpointableInputFormat
Returns the split currently being read, along with its current state. This will be used to restore the state of the reading channel when recovering from a task failure. In the case of a simple text file, the state can correspond to the last read offset in the split.- Specified by:
getCurrentState
in interfaceCheckpointableInputFormat<FileInputSplit,Tuple2<Long,Long>>
- Returns:
- The state of the channel.
- Throws:
IOException
- Thrown if the creation of the state object failed.
-
reopen
public void reopen(FileInputSplit split, Tuple2<Long,Long> state) throws IOException
Description copied from interface:CheckpointableInputFormat
Restores the state of a parallel instance reading from anInputFormat
. This is necessary when recovering from a task failure. When this method is called, the input format it guaranteed to be configured.NOTE: The caller has to make sure that the provided split is the one to whom the state belongs.
- Specified by:
reopen
in interfaceCheckpointableInputFormat<FileInputSplit,Tuple2<Long,Long>>
- Parameters:
split
- The split to be opened.state
- The state from which to start from. This can contain the offset, but also other data, depending on the input format.- Throws:
IOException
-
-