Interface InputFormat<OT,T extends InputSplit>
-
- Type Parameters:
OT
- The type of the produced records.T
- The type of input split.
- All Superinterfaces:
InputSplitSource<T>
,Serializable
- All Known Implementing Classes:
AbstractCsvInputFormat
,AvroInputFormat
,BinaryInputFormat
,BroadcastStateInputFormat
,CollectionInputFormat
,CsvInputFormat
,DelimitedInputFormat
,FileInputFormat
,GenericCsvInputFormat
,GenericInputFormat
,HadoopInputFormat
,HadoopInputFormat
,HadoopInputFormatBase
,HadoopInputFormatBase
,HadoopInputFormatCommonBase
,KeyedStateInputFormat
,ListStateInputFormat
,ReplicatingInputFormat
,RichInputFormat
,RowCsvInputFormat
,RowCsvInputFormat
,SerializedInputFormat
,TextInputFormat
,TransactionRowInputFormat
,UnionStateInputFormat
,ValuesInputFormat
@Public public interface InputFormat<OT,T extends InputSplit> extends InputSplitSource<T>, Serializable
The base interface for data sources that produces records.The input format handles the following:
- It describes how the input is split into splits that can be processed in parallel.
- It describes how to read records from the input split.
- It describes how to gather basic statistics from the input.
The life cycle of an input format is the following:
- After being instantiated (parameterless), it is configured with a
Configuration
object. Basic fields are read from the configuration, such as a file path, if the format describes files as input. - Optionally: It is called by the compiler to produce basic statistics about the input.
- It is called to create the input splits.
- Each parallel input task creates an instance, configures it and opens it for a specific split.
- All records are read from the input
- The input format is closed
IMPORTANT NOTE: Input formats must be written such that an instance can be opened again after it was closed. That is due to the fact that the input format is used for potentially multiple splits. After a split is done, the format's close function is invoked and, if another split is available, the open function is invoked afterwards for the next split.
- See Also:
InputSplit
,BaseStatistics
-
-
Method Summary
All Methods Instance Methods Abstract Methods Modifier and Type Method Description void
close()
Method that marks the end of the life-cycle of an input split.void
configure(Configuration parameters)
Configures this input format.T[]
createInputSplits(int minNumSplits)
Computes the input splits.InputSplitAssigner
getInputSplitAssigner(T[] inputSplits)
Returns the assigner for the input splits.BaseStatistics
getStatistics(BaseStatistics cachedStatistics)
Gets the basic statistics from the input described by this format.OT
nextRecord(OT reuse)
Reads the next record from the input.void
open(T split)
Opens a parallel instance of the input format to work on a split.boolean
reachedEnd()
Method used to check if the end of the input is reached.
-
-
-
Method Detail
-
configure
void configure(Configuration parameters)
Configures this input format. Since input formats are instantiated generically and hence parameterless, this method is the place where the input formats set their basic fields based on configuration values.This method is always called first on a newly instantiated input format.
- Parameters:
parameters
- The configuration with all parameters (note: not the Flink config but the TaskConfig).
-
getStatistics
BaseStatistics getStatistics(BaseStatistics cachedStatistics) throws IOException
Gets the basic statistics from the input described by this format. If the input format does not know how to create those statistics, it may return null. This method optionally gets a cached version of the statistics. The input format may examine them and decide whether it directly returns them without spending effort to re-gather the statistics.When this method is called, the input format is guaranteed to be configured.
- Parameters:
cachedStatistics
- The statistics that were cached. May be null.- Returns:
- The base statistics for the input, or null, if not available.
- Throws:
IOException
-
createInputSplits
T[] createInputSplits(int minNumSplits) throws IOException
Description copied from interface:InputSplitSource
Computes the input splits. The given minimum number of splits is a hint as to how many splits are desired.- Specified by:
createInputSplits
in interfaceInputSplitSource<OT>
- Parameters:
minNumSplits
- Number of minimal input splits, as a hint.- Returns:
- An array of input splits.
- Throws:
IOException
-
getInputSplitAssigner
InputSplitAssigner getInputSplitAssigner(T[] inputSplits)
Description copied from interface:InputSplitSource
Returns the assigner for the input splits. Assigner determines which parallel instance of the input format gets which input split.- Specified by:
getInputSplitAssigner
in interfaceInputSplitSource<OT>
- Returns:
- The input split assigner.
-
open
void open(T split) throws IOException
Opens a parallel instance of the input format to work on a split.When this method is called, the input format it guaranteed to be configured.
- Parameters:
split
- The split to be opened.- Throws:
IOException
- Thrown, if the spit could not be opened due to an I/O problem.
-
reachedEnd
boolean reachedEnd() throws IOException
Method used to check if the end of the input is reached.When this method is called, the input format it guaranteed to be opened.
- Returns:
- True if the end is reached, otherwise false.
- Throws:
IOException
- Thrown, if an I/O error occurred.
-
nextRecord
OT nextRecord(OT reuse) throws IOException
Reads the next record from the input.When this method is called, the input format it guaranteed to be opened.
- Parameters:
reuse
- Object that may be reused.- Returns:
- Read record.
- Throws:
IOException
- Thrown, if an I/O error occurred.
-
close
void close() throws IOException
Method that marks the end of the life-cycle of an input split. Should be used to close channels and streams and release resources. After this method returns without an error, the input is assumed to be correctly read.When this method is called, the input format it guaranteed to be opened.
- Throws:
IOException
- Thrown, if the input could not be closed properly.
-
-