Class ReplicatingInputFormat<OT,S extends InputSplit>
- java.lang.Object
-
- org.apache.flink.api.common.io.RichInputFormat<OT,S>
-
- org.apache.flink.api.common.io.ReplicatingInputFormat<OT,S>
-
- Type Parameters:
OT
- The output type of the wrapped InputFormat.S
- The InputSplit type of the wrapped InputFormat.
- All Implemented Interfaces:
Serializable
,InputFormat<OT,S>
,InputSplitSource<S>
@PublicEvolving public final class ReplicatingInputFormat<OT,S extends InputSplit> extends RichInputFormat<OT,S>
A ReplicatingInputFormat replicates anyInputFormat
to all parallel instances of a DataSource, i.e., the full input of the replicated InputFormat is completely processed by each parallel instance of the DataSource. This is done by assigning allInputSplit
s generated by the replicated InputFormat to each parallel instance.Replicated data can only be used as input for a
InnerJoinOperatorBase
orCrossOperatorBase
with the same parallelism as the DataSource. Before being used as an input to a Join or Cross operator, replicated data might be processed in local pipelines by Map-based operators with the same parallelism as the source. Map-based operators areMapOperatorBase
,FlatMapOperatorBase
,FilterOperatorBase
, andMapPartitionOperatorBase
.Replicated DataSources can be used for local join processing (no data shipping) if one input is accessible on all parallel instance of a join and the other input is (randomly) partitioned across all parallel instances.
However, a replicated DataSource is a plan hint that can invalidate a Flink program if not used correctly (see usage instructions above). In such situations, the optimizer is not able to generate a valid execution plan and the program execution will fail.
-
-
Constructor Summary
Constructors Constructor Description ReplicatingInputFormat(InputFormat<OT,S> wrappedIF)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
close()
Method that marks the end of the life-cycle of an input split.void
closeInputFormat()
Closes this InputFormat instance.void
configure(Configuration parameters)
Configures this input format.S[]
createInputSplits(int minNumSplits)
Computes the input splits.InputSplitAssigner
getInputSplitAssigner(S[] inputSplits)
Returns the assigner for the input splits.InputFormat<OT,S>
getReplicatedInputFormat()
RuntimeContext
getRuntimeContext()
BaseStatistics
getStatistics(BaseStatistics cachedStatistics)
Gets the basic statistics from the input described by this format.OT
nextRecord(OT reuse)
Reads the next record from the input.void
open(S split)
Opens a parallel instance of the input format to work on a split.void
openInputFormat()
Opens this InputFormat instance.boolean
reachedEnd()
Method used to check if the end of the input is reached.void
setRuntimeContext(RuntimeContext context)
-
-
-
Constructor Detail
-
ReplicatingInputFormat
public ReplicatingInputFormat(InputFormat<OT,S> wrappedIF)
-
-
Method Detail
-
getReplicatedInputFormat
public InputFormat<OT,S> getReplicatedInputFormat()
-
configure
public void configure(Configuration parameters)
Description copied from interface:InputFormat
Configures this input format. Since input formats are instantiated generically and hence parameterless, this method is the place where the input formats set their basic fields based on configuration values.This method is always called first on a newly instantiated input format.
- Parameters:
parameters
- The configuration with all parameters (note: not the Flink config but the TaskConfig).
-
getStatistics
public BaseStatistics getStatistics(BaseStatistics cachedStatistics) throws IOException
Description copied from interface:InputFormat
Gets the basic statistics from the input described by this format. If the input format does not know how to create those statistics, it may return null. This method optionally gets a cached version of the statistics. The input format may examine them and decide whether it directly returns them without spending effort to re-gather the statistics.When this method is called, the input format is guaranteed to be configured.
- Parameters:
cachedStatistics
- The statistics that were cached. May be null.- Returns:
- The base statistics for the input, or null, if not available.
- Throws:
IOException
-
createInputSplits
public S[] createInputSplits(int minNumSplits) throws IOException
Description copied from interface:InputSplitSource
Computes the input splits. The given minimum number of splits is a hint as to how many splits are desired.- Parameters:
minNumSplits
- Number of minimal input splits, as a hint.- Returns:
- An array of input splits.
- Throws:
IOException
-
getInputSplitAssigner
public InputSplitAssigner getInputSplitAssigner(S[] inputSplits)
Description copied from interface:InputSplitSource
Returns the assigner for the input splits. Assigner determines which parallel instance of the input format gets which input split.- Returns:
- The input split assigner.
-
open
public void open(S split) throws IOException
Description copied from interface:InputFormat
Opens a parallel instance of the input format to work on a split.When this method is called, the input format it guaranteed to be configured.
- Parameters:
split
- The split to be opened.- Throws:
IOException
- Thrown, if the spit could not be opened due to an I/O problem.
-
reachedEnd
public boolean reachedEnd() throws IOException
Description copied from interface:InputFormat
Method used to check if the end of the input is reached.When this method is called, the input format it guaranteed to be opened.
- Returns:
- True if the end is reached, otherwise false.
- Throws:
IOException
- Thrown, if an I/O error occurred.
-
nextRecord
public OT nextRecord(OT reuse) throws IOException
Description copied from interface:InputFormat
Reads the next record from the input.When this method is called, the input format it guaranteed to be opened.
- Parameters:
reuse
- Object that may be reused.- Returns:
- Read record.
- Throws:
IOException
- Thrown, if an I/O error occurred.
-
close
public void close() throws IOException
Description copied from interface:InputFormat
Method that marks the end of the life-cycle of an input split. Should be used to close channels and streams and release resources. After this method returns without an error, the input is assumed to be correctly read.When this method is called, the input format it guaranteed to be opened.
- Throws:
IOException
- Thrown, if the input could not be closed properly.
-
setRuntimeContext
public void setRuntimeContext(RuntimeContext context)
- Overrides:
setRuntimeContext
in classRichInputFormat<OT,S extends InputSplit>
-
getRuntimeContext
public RuntimeContext getRuntimeContext()
- Overrides:
getRuntimeContext
in classRichInputFormat<OT,S extends InputSplit>
-
openInputFormat
public void openInputFormat() throws IOException
Description copied from class:RichInputFormat
Opens this InputFormat instance. This method is called once per parallel instance. Resources should be allocated in this method. (e.g. database connections, cache, etc.)- Overrides:
openInputFormat
in classRichInputFormat<OT,S extends InputSplit>
- Throws:
IOException
- in case allocating the resources failed.- See Also:
InputFormat
-
closeInputFormat
public void closeInputFormat() throws IOException
Description copied from class:RichInputFormat
Closes this InputFormat instance. This method is called once per parallel instance. Resources allocated duringRichInputFormat.openInputFormat()
should be closed in this method.- Overrides:
closeInputFormat
in classRichInputFormat<OT,S extends InputSplit>
- Throws:
IOException
- in case closing the resources failed- See Also:
InputFormat
-
-