OT
- The type of the produced records.T
- The type of input split.@Public public interface InputFormat<OT,T extends InputSplit> extends InputSplitSource<T>, Serializable
The input format handles the following:
The life cycle of an input format is the following:
Configuration
object. Basic fields are read from the configuration, such as a file path, if the format
describes files as input.
IMPORTANT NOTE: Input formats must be written such that an instance can be opened again after it was closed. That is due to the fact that the input format is used for potentially multiple splits. After a split is done, the format's close function is invoked and, if another split is available, the open function is invoked afterwards for the next split.
InputSplit
,
BaseStatistics
Modifier and Type | Method and Description |
---|---|
void |
close()
Method that marks the end of the life-cycle of an input split.
|
void |
configure(Configuration parameters)
Configures this input format.
|
T[] |
createInputSplits(int minNumSplits)
Computes the input splits.
|
InputSplitAssigner |
getInputSplitAssigner(T[] inputSplits)
Returns the assigner for the input splits.
|
BaseStatistics |
getStatistics(BaseStatistics cachedStatistics)
Gets the basic statistics from the input described by this format.
|
OT |
nextRecord(OT reuse)
Reads the next record from the input.
|
void |
open(T split)
Opens a parallel instance of the input format to work on a split.
|
boolean |
reachedEnd()
Method used to check if the end of the input is reached.
|
void configure(Configuration parameters)
This method is always called first on a newly instantiated input format.
parameters
- The configuration with all parameters (note: not the Flink config but the
TaskConfig).BaseStatistics getStatistics(BaseStatistics cachedStatistics) throws IOException
When this method is called, the input format is guaranteed to be configured.
cachedStatistics
- The statistics that were cached. May be null.IOException
T[] createInputSplits(int minNumSplits) throws IOException
InputSplitSource
createInputSplits
in interface InputSplitSource<T extends InputSplit>
minNumSplits
- Number of minimal input splits, as a hint.IOException
InputSplitAssigner getInputSplitAssigner(T[] inputSplits)
InputSplitSource
getInputSplitAssigner
in interface InputSplitSource<T extends InputSplit>
void open(T split) throws IOException
When this method is called, the input format it guaranteed to be configured.
split
- The split to be opened.IOException
- Thrown, if the spit could not be opened due to an I/O problem.boolean reachedEnd() throws IOException
When this method is called, the input format it guaranteed to be opened.
IOException
- Thrown, if an I/O error occurred.OT nextRecord(OT reuse) throws IOException
When this method is called, the input format it guaranteed to be opened.
reuse
- Object that may be reused.IOException
- Thrown, if an I/O error occurred.void close() throws IOException
When this method is called, the input format it guaranteed to be opened.
IOException
- Thrown, if the input could not be closed properly.Copyright © 2014–2024 The Apache Software Foundation. All rights reserved.