T
- The data type of set that is the input and feedback of the iteration.@Public public class IterativeDataSet<T> extends SingleInputOperator<T,T,IterativeDataSet<T>>
DataSet.iterate(int)
method.DataSet.iterate(int)
minResources, name, parallelism, preferredResources
Constructor and Description |
---|
IterativeDataSet(ExecutionEnvironment context,
TypeInformation<T> type,
DataSet<T> input,
int maxIterations) |
Modifier and Type | Method and Description |
---|---|
DataSet<T> |
closeWith(DataSet<T> iterationResult)
Closes the iteration.
|
DataSet<T> |
closeWith(DataSet<T> iterationResult,
DataSet<?> terminationCriterion)
Closes the iteration and specifies a termination criterion.
|
AggregatorRegistry |
getAggregators()
Gets the registry for aggregators.
|
int |
getMaxIterations()
Gets the maximum number of iterations.
|
<X extends Value> |
registerAggregationConvergenceCriterion(String name,
Aggregator<X> aggregator,
ConvergenceCriterion<X> convergenceCheck)
Registers an
Aggregator for the iteration together with a ConvergenceCriterion . |
IterativeDataSet<T> |
registerAggregator(String name,
Aggregator<?> aggregator)
Registers an
Aggregator for the iteration. |
protected SingleInputOperator<T,T,?> |
translateToDataFlow(Operator<T> input)
Translates this operation to a data flow operator of the common data flow API.
|
getInput, getInputType
getMinResources, getName, getParallelism, getPreferredResources, getResultType, name, setParallelism
aggregate, checkSameExecutionContext, clean, coGroup, collect, combineGroup, count, cross, crossWithHuge, crossWithTiny, distinct, distinct, distinct, distinct, fillInType, filter, first, flatMap, fullOuterJoin, fullOuterJoin, getExecutionEnvironment, getType, groupBy, groupBy, groupBy, iterate, iterateDelta, join, join, joinWithHuge, joinWithTiny, leftOuterJoin, leftOuterJoin, map, mapPartition, max, maxBy, min, minBy, output, partitionByHash, partitionByHash, partitionByHash, partitionByRange, partitionByRange, partitionByRange, partitionCustom, partitionCustom, partitionCustom, print, print, printOnTaskManager, printToErr, printToErr, project, rebalance, reduce, reduceGroup, rightOuterJoin, rightOuterJoin, runOperation, sortPartition, sortPartition, sortPartition, sum, union, write, write, writeAsCsv, writeAsCsv, writeAsCsv, writeAsCsv, writeAsFormattedText, writeAsFormattedText, writeAsText, writeAsText
public IterativeDataSet(ExecutionEnvironment context, TypeInformation<T> type, DataSet<T> input, int maxIterations)
public DataSet<T> closeWith(DataSet<T> iterationResult)
iterationResult
- The data set that will be fed back to the next iteration.DataSet.iterate(int)
public DataSet<T> closeWith(DataSet<T> iterationResult, DataSet<?> terminationCriterion)
The termination criterion is a means of dynamically signaling the iteration to halt. It is expressed via a data set that will trigger to halt the loop as soon as the data set is empty. A typical way of using the termination criterion is to have a filter that filters out all elements that are considered non-converged. As soon as no more such elements exist, the iteration finishes.
iterationResult
- The data set that will be fed back to the next iteration.terminationCriterion
- The data set that being used to trigger halt on operation once it is empty.DataSet.iterate(int)
public int getMaxIterations()
@PublicEvolving public IterativeDataSet<T> registerAggregator(String name, Aggregator<?> aggregator)
Aggregator
for the iteration. Aggregators can be used to maintain simple statistics during the
iteration, such as number of elements processed. The aggregators compute global aggregates: After each iteration step,
the values are globally aggregated to produce one aggregate that represents statistics across all parallel instances.
The value of an aggregator can be accessed in the next iteration.
Aggregators can be accessed inside a function via the
AbstractRichFunction.getIterationRuntimeContext()
method.
name
- The name under which the aggregator is registered.aggregator
- The aggregator class.@PublicEvolving public <X extends Value> IterativeDataSet<T> registerAggregationConvergenceCriterion(String name, Aggregator<X> aggregator, ConvergenceCriterion<X> convergenceCheck)
Aggregator
for the iteration together with a ConvergenceCriterion
. For a general description
of aggregators, see registerAggregator(String, Aggregator)
and Aggregator
.
At the end of each iteration, the convergence criterion takes the aggregator's global aggregate value and decided whether
the iteration should terminate. A typical use case is to have an aggregator that sums up the total error of change
in an iteration step and have to have a convergence criterion that signals termination as soon as the aggregate value
is below a certain threshold.name
- The name under which the aggregator is registered.aggregator
- The aggregator class.convergenceCheck
- The convergence criterion.@PublicEvolving public AggregatorRegistry getAggregators()
Aggregator
s and an aggregator-based
ConvergenceCriterion
. This method offers an alternative way to registering the aggregators via
registerAggregator(String, Aggregator)
and registerAggregationConvergenceCriterion(String, Aggregator, ConvergenceCriterion)
.protected SingleInputOperator<T,T,?> translateToDataFlow(Operator<T> input)
SingleInputOperator
translateToDataFlow
in class SingleInputOperator<T,T,IterativeDataSet<T>>
input
- The data flow operator that produces this operation's input data.Copyright © 2014–2020 The Apache Software Foundation. All rights reserved.