@Public public abstract class BinaryInputFormat<T> extends FileInputFormat<T> implements CheckpointableInputFormat<FileInputSplit,Tuple2<Long,Long>>
BlockInfo
at the end of the block. There, the reader can find some statistics
about the split currently being read, that will help correctly parse the contents of the block.Modifier and Type | Class and Description |
---|---|
protected class |
BinaryInputFormat.BlockBasedInput
Reads the content of a block of data.
|
FileInputFormat.FileBaseStatistics, FileInputFormat.InputSplitOpenThread
Modifier and Type | Field and Description |
---|---|
static String |
BLOCK_SIZE_PARAMETER_KEY
The config parameter which defines the fixed length of a record.
|
static long |
NATIVE_BLOCK_SIZE |
currentSplit, ENUMERATE_NESTED_FILES_FLAG, enumerateNestedFiles, filePath, INFLATER_INPUT_STREAM_FACTORIES, minSplitSize, numSplits, openTimeout, READ_WHOLE_SPLIT_FLAG, splitLength, splitStart, stream, unsplittable
Constructor and Description |
---|
BinaryInputFormat() |
Modifier and Type | Method and Description |
---|---|
void |
configure(Configuration parameters)
Configures the file input format by reading the file path from the configuration.
|
BlockInfo |
createBlockInfo() |
FileInputSplit[] |
createInputSplits(int minNumSplits)
Computes the input splits for the file.
|
protected org.apache.flink.api.common.io.BinaryInputFormat.SequentialStatistics |
createStatistics(List<FileStatus> files,
FileInputFormat.FileBaseStatistics stats)
Fill in the statistics.
|
protected abstract T |
deserialize(T reuse,
DataInputView dataInput) |
long |
getBlockSize() |
Tuple2<Long,Long> |
getCurrentState()
Returns the split currently being read, along with its current state.
|
protected List<FileStatus> |
getFiles() |
protected FileInputSplit[] |
getInputSplits() |
org.apache.flink.api.common.io.BinaryInputFormat.SequentialStatistics |
getStatistics(BaseStatistics cachedStats)
Obtains basic file statistics containing only file size.
|
T |
nextRecord(T record)
Reads the next record from the input.
|
void |
open(FileInputSplit split)
Opens an input stream to the file defined in the input format.
|
boolean |
reachedEnd()
Method used to check if the end of the input is reached.
|
void |
reopen(FileInputSplit split,
Tuple2<Long,Long> state)
Restores the state of a parallel instance reading from an
InputFormat . |
void |
setBlockSize(long blockSize) |
acceptFile, close, decorateInputStream, extractFileExtension, getFilePath, getFilePaths, getFileStats, getFileStats, getInflaterInputStreamFactory, getInputSplitAssigner, getMinSplitSize, getNestedFileEnumeration, getNumSplits, getOpenTimeout, getSplitLength, getSplitStart, registerInflaterInputStreamFactory, setFilePath, setFilePath, setFilePaths, setFilePaths, setFilesFilter, setMinSplitSize, setNestedFileEnumeration, setNumSplits, setOpenTimeout, supportsMultiPaths, testForUnsplittable, toString
closeInputFormat, getRuntimeContext, openInputFormat, setRuntimeContext
public static final String BLOCK_SIZE_PARAMETER_KEY
public static final long NATIVE_BLOCK_SIZE
public void configure(Configuration parameters)
FileInputFormat
configure
in interface InputFormat<T,FileInputSplit>
configure
in class FileInputFormat<T>
parameters
- The configuration with all parameters (note: not the Flink config but the TaskConfig).InputFormat.configure(org.apache.flink.configuration.Configuration)
public void setBlockSize(long blockSize)
public long getBlockSize()
public FileInputSplit[] createInputSplits(int minNumSplits) throws IOException
FileInputFormat
createInputSplits
in interface InputFormat<T,FileInputSplit>
createInputSplits
in interface InputSplitSource<FileInputSplit>
createInputSplits
in class FileInputFormat<T>
minNumSplits
- The minimum desired number of file splits.IOException
- Thrown, when the creation of the splits was erroneous.InputFormat.createInputSplits(int)
protected List<FileStatus> getFiles() throws IOException
IOException
public org.apache.flink.api.common.io.BinaryInputFormat.SequentialStatistics getStatistics(BaseStatistics cachedStats)
FileInputFormat
getStatistics
in interface InputFormat<T,FileInputSplit>
getStatistics
in class FileInputFormat<T>
cachedStats
- The statistics that were cached. May be null.InputFormat.getStatistics(org.apache.flink.api.common.io.statistics.BaseStatistics)
protected FileInputSplit[] getInputSplits() throws IOException
IOException
public BlockInfo createBlockInfo()
protected org.apache.flink.api.common.io.BinaryInputFormat.SequentialStatistics createStatistics(List<FileStatus> files, FileInputFormat.FileBaseStatistics stats) throws IOException
files
- The files that are associated with this block input format.stats
- The pre-filled statistics.IOException
public void open(FileInputSplit split) throws IOException
FileInputFormat
The stream is actually opened in an asynchronous thread to make sure any interruptions to the thread working on the input format do not reach the file system.
open
in interface InputFormat<T,FileInputSplit>
open
in class FileInputFormat<T>
split
- The split to be opened.IOException
- Thrown, if the spit could not be opened due to an I/O problem.public boolean reachedEnd() throws IOException
InputFormat
When this method is called, the input format it guaranteed to be opened.
reachedEnd
in interface InputFormat<T,FileInputSplit>
IOException
- Thrown, if an I/O error occurred.public T nextRecord(T record) throws IOException
InputFormat
When this method is called, the input format it guaranteed to be opened.
nextRecord
in interface InputFormat<T,FileInputSplit>
record
- Object that may be reused.IOException
- Thrown, if an I/O error occurred.protected abstract T deserialize(T reuse, DataInputView dataInput) throws IOException
IOException
@PublicEvolving public Tuple2<Long,Long> getCurrentState() throws IOException
CheckpointableInputFormat
getCurrentState
in interface CheckpointableInputFormat<FileInputSplit,Tuple2<Long,Long>>
IOException
- Thrown if the creation of the state object failed.@PublicEvolving public void reopen(FileInputSplit split, Tuple2<Long,Long> state) throws IOException
CheckpointableInputFormat
InputFormat
.
This is necessary when recovering from a task failure. When this method is called,
the input format it guaranteed to be configured.
NOTE: The caller has to make sure that the provided split is the one to whom
the state belongs.reopen
in interface CheckpointableInputFormat<FileInputSplit,Tuple2<Long,Long>>
split
- The split to be opened.state
- The state from which to start from. This can contain the offset,
but also other data, depending on the input format.IOException
Copyright © 2014–2020 The Apache Software Foundation. All rights reserved.