OT
- The type of the produced records.T
- The type of input split.@Public public interface InputFormat<OT,T extends InputSplit> extends InputSplitSource<T>, Serializable
The input format handles the following:
The life cycle of an input format is the following:
Configuration
object.
Basic fields are read from the configuration, such as for example a file path, if the format describes
files as input.IMPORTANT NOTE: Input formats must be written such that an instance can be opened again after it was closed. That is due to the fact that the input format is used for potentially multiple splits. After a split is done, the format's close function is invoked and, if another split is available, the open function is invoked afterwards for the next split.
InputSplit
,
BaseStatistics
Modifier and Type | Method and Description |
---|---|
void |
close()
Method that marks the end of the life-cycle of an input split.
|
void |
configure(Configuration parameters)
Configures this input format.
|
T[] |
createInputSplits(int minNumSplits)
Creates the different splits of the input that can be processed in parallel.
|
InputSplitAssigner |
getInputSplitAssigner(T[] inputSplits)
Gets the type of the input splits that are processed by this input format.
|
BaseStatistics |
getStatistics(BaseStatistics cachedStatistics)
Gets the basic statistics from the input described by this format.
|
OT |
nextRecord(OT reuse)
Reads the next record from the input.
|
void |
open(T split)
Opens a parallel instance of the input format to work on a split.
|
boolean |
reachedEnd()
Method used to check if the end of the input is reached.
|
void configure(Configuration parameters)
This method is always called first on a newly instantiated input format.
parameters
- The configuration with all parameters (note: not the Flink config but the TaskConfig).BaseStatistics getStatistics(BaseStatistics cachedStatistics) throws IOException
When this method is called, the input format it guaranteed to be configured.
cachedStatistics
- The statistics that were cached. May be null.IOException
T[] createInputSplits(int minNumSplits) throws IOException
When this method is called, the input format it guaranteed to be configured.
createInputSplits
in interface InputSplitSource<T extends InputSplit>
minNumSplits
- The minimum desired number of splits. If fewer are created, some parallel
instances may remain idle.IOException
- Thrown, when the creation of the splits was erroneous.InputSplitAssigner getInputSplitAssigner(T[] inputSplits)
getInputSplitAssigner
in interface InputSplitSource<T extends InputSplit>
void open(T split) throws IOException
When this method is called, the input format it guaranteed to be configured.
split
- The split to be opened.IOException
- Thrown, if the spit could not be opened due to an I/O problem.boolean reachedEnd() throws IOException
When this method is called, the input format it guaranteed to be opened.
IOException
- Thrown, if an I/O error occurred.OT nextRecord(OT reuse) throws IOException
When this method is called, the input format it guaranteed to be opened.
reuse
- Object that may be reused.IOException
- Thrown, if an I/O error occurred.void close() throws IOException
When this method is called, the input format it guaranteed to be opened.
IOException
- Thrown, if the input could not be closed properly.Copyright © 2014–2019 The Apache Software Foundation. All rights reserved.