OT
- The output type of the wrapped InputFormat.S
- The InputSplit type of the wrapped InputFormat.@PublicEvolving public final class ReplicatingInputFormat<OT,S extends InputSplit> extends RichInputFormat<OT,S>
InputFormat
to all parallel instances of a DataSource,
i.e., the full input of the replicated InputFormat is completely processed by each parallel instance of the DataSource.
This is done by assigning all InputSplit
s generated by the
replicated InputFormat to each parallel instance.
Replicated data can only be used as input for a InnerJoinOperatorBase
or
CrossOperatorBase
with the same parallelism as the DataSource.
Before being used as an input to a Join or Cross operator, replicated data might be processed in local pipelines by
Map-based operators with the same parallelism as the source. Map-based operators are
MapOperatorBase
,
FlatMapOperatorBase
,
FilterOperatorBase
, and
MapPartitionOperatorBase
.
Replicated DataSources can be used for local join processing (no data shipping) if one input is accessible on all
parallel instance of a join and the other input is (randomly) partitioned across all parallel instances.
However, a replicated DataSource is a plan hint that can invalidate a Flink program if not used correctly (see
usage instructions above). In such situations, the optimizer is not able to generate a valid execution plan and
the program execution will fail.Constructor and Description |
---|
ReplicatingInputFormat(InputFormat<OT,S> wrappedIF) |
Modifier and Type | Method and Description |
---|---|
void |
close()
Method that marks the end of the life-cycle of an input split.
|
void |
closeInputFormat()
Closes this InputFormat instance.
|
void |
configure(Configuration parameters)
Configures this input format.
|
S[] |
createInputSplits(int minNumSplits)
Creates the different splits of the input that can be processed in parallel.
|
InputSplitAssigner |
getInputSplitAssigner(S[] inputSplits)
Gets the type of the input splits that are processed by this input format.
|
InputFormat<OT,S> |
getReplicatedInputFormat() |
RuntimeContext |
getRuntimeContext() |
BaseStatistics |
getStatistics(BaseStatistics cachedStatistics)
Gets the basic statistics from the input described by this format.
|
OT |
nextRecord(OT reuse)
Reads the next record from the input.
|
void |
open(S split)
Opens a parallel instance of the input format to work on a split.
|
void |
openInputFormat()
Opens this InputFormat instance.
|
boolean |
reachedEnd()
Method used to check if the end of the input is reached.
|
void |
setRuntimeContext(RuntimeContext context) |
public ReplicatingInputFormat(InputFormat<OT,S> wrappedIF)
public InputFormat<OT,S> getReplicatedInputFormat()
public void configure(Configuration parameters)
InputFormat
This method is always called first on a newly instantiated input format.
parameters
- The configuration with all parameters (note: not the Flink config but the TaskConfig).public BaseStatistics getStatistics(BaseStatistics cachedStatistics) throws IOException
InputFormat
When this method is called, the input format it guaranteed to be configured.
cachedStatistics
- The statistics that were cached. May be null.IOException
public S[] createInputSplits(int minNumSplits) throws IOException
InputFormat
When this method is called, the input format it guaranteed to be configured.
minNumSplits
- The minimum desired number of splits. If fewer are created, some parallel
instances may remain idle.IOException
- Thrown, when the creation of the splits was erroneous.public InputSplitAssigner getInputSplitAssigner(S[] inputSplits)
InputFormat
public void open(S split) throws IOException
InputFormat
When this method is called, the input format it guaranteed to be configured.
split
- The split to be opened.IOException
- Thrown, if the spit could not be opened due to an I/O problem.public boolean reachedEnd() throws IOException
InputFormat
When this method is called, the input format it guaranteed to be opened.
IOException
- Thrown, if an I/O error occurred.public OT nextRecord(OT reuse) throws IOException
InputFormat
When this method is called, the input format it guaranteed to be opened.
reuse
- Object that may be reused.IOException
- Thrown, if an I/O error occurred.public void close() throws IOException
InputFormat
When this method is called, the input format it guaranteed to be opened.
IOException
- Thrown, if the input could not be closed properly.public void setRuntimeContext(RuntimeContext context)
setRuntimeContext
in class RichInputFormat<OT,S extends InputSplit>
public RuntimeContext getRuntimeContext()
getRuntimeContext
in class RichInputFormat<OT,S extends InputSplit>
public void openInputFormat() throws IOException
RichInputFormat
openInputFormat
in class RichInputFormat<OT,S extends InputSplit>
IOException
- in case allocating the resources failed.InputFormat
public void closeInputFormat() throws IOException
RichInputFormat
RichInputFormat.openInputFormat()
should be closed in this method.closeInputFormat
in class RichInputFormat<OT,S extends InputSplit>
IOException
- in case closing the resources failedInputFormat
Copyright © 2014–2020 The Apache Software Foundation. All rights reserved.