pyflink.datastream.functions.AggregateFunction#
- class AggregateFunction[source]#
The AggregateFunction is a flexible aggregation function, characterized by the following features:
The aggregates may use different types for input values, intermediate aggregates, and result type, to support a wide range of aggregation types.
Support for distributive aggregations: Different intermediate aggregates can be merged together, to allow for pre-aggregation/final-aggregation optimizations.
The AggregateFunction’s intermediate aggregate (in-progress aggregation state) is called the accumulator. Values are added to the accumulator, and final aggregates are obtained by finalizing the accumulator state. This supports aggregation functions where the intermediate state needs to be different than the aggregated values and the final result type, such as for example average (which typically keeps a count and sum). Merging intermediate aggregates (partial aggregates) means merging the accumulators.
The AggregationFunction itself is stateless. To allow a single AggregationFunction instance to maintain multiple aggregates (such as one aggregate per key), the AggregationFunction creates a new accumulator whenever a new aggregation is started.
Methods
add
(value, accumulator)Adds the given input value to the given accumulator, returning the new accumulator value.
close
()create_accumulator
()Creates a new accumulator, starting a new aggregate.
get_result
(accumulator)Gets the result of the aggregation from the accumulator.
merge
(acc_a, acc_b)Merges two accumulators, returning an accumulator with the merged state.
open
(runtime_context)