OT
- The output type of the wrapped InputFormat.S
- The InputSplit type of the wrapped InputFormat.@PublicEvolving public final class ReplicatingInputFormat<OT,S extends InputSplit> extends RichInputFormat<OT,S>
InputFormat
to all parallel instances of a
DataSource, i.e., the full input of the replicated InputFormat is completely processed by each
parallel instance of the DataSource. This is done by assigning all InputSplit
s generated by the replicated InputFormat to each parallel
instance.
Replicated data can only be used as input for a InnerJoinOperatorBase
or CrossOperatorBase
with the same parallelism as the
DataSource. Before being used as an input to a Join or Cross operator, replicated data might be
processed in local pipelines by Map-based operators with the same parallelism as the source.
Map-based operators are MapOperatorBase
,
FlatMapOperatorBase
, FilterOperatorBase
, and MapPartitionOperatorBase
.
Replicated DataSources can be used for local join processing (no data shipping) if one input is accessible on all parallel instance of a join and the other input is (randomly) partitioned across all parallel instances.
However, a replicated DataSource is a plan hint that can invalidate a Flink program if not used correctly (see usage instructions above). In such situations, the optimizer is not able to generate a valid execution plan and the program execution will fail.
Constructor and Description |
---|
ReplicatingInputFormat(InputFormat<OT,S> wrappedIF) |
Modifier and Type | Method and Description |
---|---|
void |
close()
Method that marks the end of the life-cycle of an input split.
|
void |
closeInputFormat()
Closes this InputFormat instance.
|
void |
configure(Configuration parameters)
Configures this input format.
|
S[] |
createInputSplits(int minNumSplits)
Computes the input splits.
|
InputSplitAssigner |
getInputSplitAssigner(S[] inputSplits)
Returns the assigner for the input splits.
|
InputFormat<OT,S> |
getReplicatedInputFormat() |
RuntimeContext |
getRuntimeContext() |
BaseStatistics |
getStatistics(BaseStatistics cachedStatistics)
Gets the basic statistics from the input described by this format.
|
OT |
nextRecord(OT reuse)
Reads the next record from the input.
|
void |
open(S split)
Opens a parallel instance of the input format to work on a split.
|
void |
openInputFormat()
Opens this InputFormat instance.
|
boolean |
reachedEnd()
Method used to check if the end of the input is reached.
|
void |
setRuntimeContext(RuntimeContext context) |
public ReplicatingInputFormat(InputFormat<OT,S> wrappedIF)
public InputFormat<OT,S> getReplicatedInputFormat()
public void configure(Configuration parameters)
InputFormat
This method is always called first on a newly instantiated input format.
parameters
- The configuration with all parameters (note: not the Flink config but the
TaskConfig).public BaseStatistics getStatistics(BaseStatistics cachedStatistics) throws IOException
InputFormat
When this method is called, the input format is guaranteed to be configured.
cachedStatistics
- The statistics that were cached. May be null.IOException
public S[] createInputSplits(int minNumSplits) throws IOException
InputSplitSource
minNumSplits
- Number of minimal input splits, as a hint.IOException
public InputSplitAssigner getInputSplitAssigner(S[] inputSplits)
InputSplitSource
public void open(S split) throws IOException
InputFormat
When this method is called, the input format it guaranteed to be configured.
split
- The split to be opened.IOException
- Thrown, if the spit could not be opened due to an I/O problem.public boolean reachedEnd() throws IOException
InputFormat
When this method is called, the input format it guaranteed to be opened.
IOException
- Thrown, if an I/O error occurred.public OT nextRecord(OT reuse) throws IOException
InputFormat
When this method is called, the input format it guaranteed to be opened.
reuse
- Object that may be reused.IOException
- Thrown, if an I/O error occurred.public void close() throws IOException
InputFormat
When this method is called, the input format it guaranteed to be opened.
IOException
- Thrown, if the input could not be closed properly.public void setRuntimeContext(RuntimeContext context)
setRuntimeContext
in class RichInputFormat<OT,S extends InputSplit>
public RuntimeContext getRuntimeContext()
getRuntimeContext
in class RichInputFormat<OT,S extends InputSplit>
public void openInputFormat() throws IOException
RichInputFormat
openInputFormat
in class RichInputFormat<OT,S extends InputSplit>
IOException
- in case allocating the resources failed.InputFormat
public void closeInputFormat() throws IOException
RichInputFormat
RichInputFormat.openInputFormat()
should be closed in this method.closeInputFormat
in class RichInputFormat<OT,S extends InputSplit>
IOException
- in case closing the resources failedInputFormat
Copyright © 2014–2024 The Apache Software Foundation. All rights reserved.