@Internal public class DataStreamUtils extends Object
DataStream
.Constructor and Description |
---|
DataStreamUtils() |
Modifier and Type | Method and Description |
---|---|
static <IN,ACC,OUT> |
aggregate(org.apache.flink.streaming.api.datastream.DataStream<IN> input,
org.apache.flink.api.common.functions.AggregateFunction<IN,ACC,OUT> func)
Aggregates the elements in each partition of the input bounded stream, and then merges the
partial results of all partitions.
|
static <IN,ACC,OUT> |
aggregate(org.apache.flink.streaming.api.datastream.DataStream<IN> input,
org.apache.flink.api.common.functions.AggregateFunction<IN,ACC,OUT> func,
org.apache.flink.api.common.typeinfo.TypeInformation<ACC> accType,
org.apache.flink.api.common.typeinfo.TypeInformation<OUT> outType)
Aggregates the elements in each partition of the input bounded stream, and then merges the
partial results of all partitions.
|
static org.apache.flink.streaming.api.datastream.DataStream<double[]> |
allReduceSum(org.apache.flink.streaming.api.datastream.DataStream<double[]> input)
Applies allReduceSum on the input data stream.
|
static <IN1,IN2,KEY extends Serializable,OUT> |
coGroup(org.apache.flink.streaming.api.datastream.DataStream<IN1> input1,
org.apache.flink.streaming.api.datastream.DataStream<IN2> input2,
org.apache.flink.api.java.functions.KeySelector<IN1,KEY> keySelector1,
org.apache.flink.api.java.functions.KeySelector<IN2,KEY> keySelector2,
org.apache.flink.api.common.typeinfo.TypeInformation<OUT> outTypeInformation,
org.apache.flink.api.common.functions.CoGroupFunction<IN1,IN2,OUT> func)
A CoGroup transformation combines the elements of two
DataStreams into one
DataStream. |
static <T> org.apache.flink.streaming.api.datastream.DataStream<T[]> |
generateBatchData(org.apache.flink.streaming.api.datastream.DataStream<T> inputData,
int downStreamParallelism,
int batchSize)
Splits the input data into global batches of batchSize.
|
static <IN,OUT> org.apache.flink.streaming.api.datastream.DataStream<OUT> |
mapPartition(org.apache.flink.streaming.api.datastream.DataStream<IN> input,
org.apache.flink.api.common.functions.MapPartitionFunction<IN,OUT> func)
Applies a
MapPartitionFunction on a bounded data stream. |
static <IN,OUT> org.apache.flink.streaming.api.datastream.DataStream<OUT> |
mapPartition(org.apache.flink.streaming.api.datastream.DataStream<IN> input,
org.apache.flink.api.common.functions.MapPartitionFunction<IN,OUT> func,
org.apache.flink.api.common.typeinfo.TypeInformation<OUT> outType)
Applies a
MapPartitionFunction on a bounded data stream. |
static <T> org.apache.flink.streaming.api.datastream.DataStream<T> |
reduce(org.apache.flink.streaming.api.datastream.DataStream<T> input,
org.apache.flink.api.common.functions.ReduceFunction<T> func)
Applies a
ReduceFunction on a bounded data stream. |
static <T> org.apache.flink.streaming.api.datastream.DataStream<T> |
reduce(org.apache.flink.streaming.api.datastream.DataStream<T> input,
org.apache.flink.api.common.functions.ReduceFunction<T> func,
org.apache.flink.api.common.typeinfo.TypeInformation<T> outType)
Applies a
ReduceFunction on a bounded data stream. |
static <T,K> org.apache.flink.streaming.api.datastream.DataStream<T> |
reduce(org.apache.flink.streaming.api.datastream.KeyedStream<T,K> input,
org.apache.flink.api.common.functions.ReduceFunction<T> func)
Applies a
ReduceFunction on a bounded keyed data stream. |
static <T,K> org.apache.flink.streaming.api.datastream.DataStream<T> |
reduce(org.apache.flink.streaming.api.datastream.KeyedStream<T,K> input,
org.apache.flink.api.common.functions.ReduceFunction<T> func,
org.apache.flink.api.common.typeinfo.TypeInformation<T> outType)
Applies a
ReduceFunction on a bounded keyed data stream. |
static <T> org.apache.flink.streaming.api.datastream.DataStream<T> |
sample(org.apache.flink.streaming.api.datastream.DataStream<T> input,
int numSamples,
long randomSeed)
Performs an approximate uniform sampling over the elements in a bounded data stream.
|
static <T> void |
setManagedMemoryWeight(org.apache.flink.streaming.api.datastream.DataStream<T> dataStream,
long memoryBytes)
Sets {Transformation#declareManagedMemoryUseCaseAtOperatorScope(ManagedMemoryUseCase, int)}
using the given bytes for
ManagedMemoryUseCase.OPERATOR . |
static <IN,OUT,W extends org.apache.flink.streaming.api.windowing.windows.Window> |
windowAllAndProcess(org.apache.flink.streaming.api.datastream.DataStream<IN> input,
Windows windows,
org.apache.flink.streaming.api.functions.windowing.ProcessAllWindowFunction<IN,OUT,W> function)
Creates windows from data in the non key grouped input stream and applies the given window
function to each window.
|
static <IN,OUT,W extends org.apache.flink.streaming.api.windowing.windows.Window> |
windowAllAndProcess(org.apache.flink.streaming.api.datastream.DataStream<IN> input,
Windows windows,
org.apache.flink.streaming.api.functions.windowing.ProcessAllWindowFunction<IN,OUT,W> function,
org.apache.flink.api.common.typeinfo.TypeInformation<OUT> outType)
Creates windows from data in the non key grouped input stream and applies the given window
function to each window.
|
public static org.apache.flink.streaming.api.datastream.DataStream<double[]> allReduceSum(org.apache.flink.streaming.api.datastream.DataStream<double[]> input)
Note that we throw exception when one of the following two cases happen:
input
- The input data stream.public static <IN,OUT> org.apache.flink.streaming.api.datastream.DataStream<OUT> mapPartition(org.apache.flink.streaming.api.datastream.DataStream<IN> input, org.apache.flink.api.common.functions.MapPartitionFunction<IN,OUT> func)
MapPartitionFunction
on a bounded data stream.IN
- The class type of the input.OUT
- The class type of output.input
- The input data stream.func
- The user defined mapPartition function.public static <IN,OUT> org.apache.flink.streaming.api.datastream.DataStream<OUT> mapPartition(org.apache.flink.streaming.api.datastream.DataStream<IN> input, org.apache.flink.api.common.functions.MapPartitionFunction<IN,OUT> func, org.apache.flink.api.common.typeinfo.TypeInformation<OUT> outType)
MapPartitionFunction
on a bounded data stream.IN
- The class type of the input.OUT
- The class type of output.input
- The input data stream.func
- The user defined mapPartition function.outType
- The type information of the output.public static <T> org.apache.flink.streaming.api.datastream.DataStream<T> reduce(org.apache.flink.streaming.api.datastream.DataStream<T> input, org.apache.flink.api.common.functions.ReduceFunction<T> func)
ReduceFunction
on a bounded data stream. The output stream contains at most
one stream record and its parallelism is one.T
- The class type of the input.input
- The input data stream.func
- The user defined reduce function.public static <T> org.apache.flink.streaming.api.datastream.DataStream<T> reduce(org.apache.flink.streaming.api.datastream.DataStream<T> input, org.apache.flink.api.common.functions.ReduceFunction<T> func, org.apache.flink.api.common.typeinfo.TypeInformation<T> outType)
ReduceFunction
on a bounded data stream. The output stream contains at most
one stream record and its parallelism is one.T
- The class type of the input.input
- The input data stream.func
- The user defined reduce function.outType
- The type information of the output.public static <T,K> org.apache.flink.streaming.api.datastream.DataStream<T> reduce(org.apache.flink.streaming.api.datastream.KeyedStream<T,K> input, org.apache.flink.api.common.functions.ReduceFunction<T> func)
ReduceFunction
on a bounded keyed data stream. The output stream contains
one stream record for each key.T
- The class type of input.K
- The key type of input.input
- The input keyed data stream.func
- The user defined reduce function.public static <T,K> org.apache.flink.streaming.api.datastream.DataStream<T> reduce(org.apache.flink.streaming.api.datastream.KeyedStream<T,K> input, org.apache.flink.api.common.functions.ReduceFunction<T> func, org.apache.flink.api.common.typeinfo.TypeInformation<T> outType)
ReduceFunction
on a bounded keyed data stream. The output stream contains
one stream record for each key.T
- The class type of input.K
- The key type of input.input
- The input keyed data stream.func
- The user defined reduce function.outType
- The type information of the output.public static <IN,ACC,OUT> org.apache.flink.streaming.api.datastream.DataStream<OUT> aggregate(org.apache.flink.streaming.api.datastream.DataStream<IN> input, org.apache.flink.api.common.functions.AggregateFunction<IN,ACC,OUT> func)
Note: If the parallelism of the input stream is N, this method would invoke AggregateFunction.createAccumulator()
N times and AggregateFunction.merge(Object,
Object)
N - 1 times. Thus the initial accumulator should be neutral (e.g. empty list for
list concatenation or `0` for summation), otherwise the aggregation result would be affected
by the parallelism of the input stream.
IN
- The class type of the input.ACC
- The class type of the accumulated values.OUT
- The class type of the output values.input
- The input data stream.func
- The user defined aggregate function.public static <IN,ACC,OUT> org.apache.flink.streaming.api.datastream.DataStream<OUT> aggregate(org.apache.flink.streaming.api.datastream.DataStream<IN> input, org.apache.flink.api.common.functions.AggregateFunction<IN,ACC,OUT> func, org.apache.flink.api.common.typeinfo.TypeInformation<ACC> accType, org.apache.flink.api.common.typeinfo.TypeInformation<OUT> outType)
Note: If the parallelism of the input stream is N, this method would invoke AggregateFunction.createAccumulator()
N times and AggregateFunction.merge(Object,
Object)
N - 1 times. Thus the initial accumulator should be neutral (e.g. empty list for
list concatenation or `0` for summation), otherwise the aggregation result would be affected
by the parallelism of the input stream.
IN
- The class type of the input.ACC
- The class type of the accumulated values.OUT
- The class type of the output values.input
- The input data stream.func
- The user defined aggregate function.accType
- The type of the accumulated values.outType
- The types of the output.public static <T> org.apache.flink.streaming.api.datastream.DataStream<T> sample(org.apache.flink.streaming.api.datastream.DataStream<T> input, int numSamples, long randomSeed)
This method takes samples without replacement. If the number of elements in the stream is smaller than expected number of samples, all elements will be included in the sample.
input
- The input data stream.numSamples
- The number of elements to be sampled.randomSeed
- The seed to randomly pick elements as sample.public static <T> void setManagedMemoryWeight(org.apache.flink.streaming.api.datastream.DataStream<T> dataStream, long memoryBytes)
ManagedMemoryUseCase.OPERATOR
.
This method is in reference to Flink's ExecNodeUtil.setManagedMemoryWeight. The provided bytes should be in the same scale as existing usage in Flink, for example, StreamExecWindowAggregate.WINDOW_AGG_MEMORY_RATIO.
public static <IN,OUT,W extends org.apache.flink.streaming.api.windowing.windows.Window> org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator<OUT> windowAllAndProcess(org.apache.flink.streaming.api.datastream.DataStream<IN> input, Windows windows, org.apache.flink.streaming.api.functions.windowing.ProcessAllWindowFunction<IN,OUT,W> function)
input
- The input data stream to be windowed and processed.windows
- The windowing strategy that defines how input data would be sliced into
batches.function
- The user defined process function.public static <IN,OUT,W extends org.apache.flink.streaming.api.windowing.windows.Window> org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator<OUT> windowAllAndProcess(org.apache.flink.streaming.api.datastream.DataStream<IN> input, Windows windows, org.apache.flink.streaming.api.functions.windowing.ProcessAllWindowFunction<IN,OUT,W> function, org.apache.flink.api.common.typeinfo.TypeInformation<OUT> outType)
input
- The input data stream to be windowed and processed.windows
- The windowing strategy that defines how input data would be sliced into
batches.function
- The user defined process function.outType
- The type information of the output.public static <IN1,IN2,KEY extends Serializable,OUT> org.apache.flink.streaming.api.datastream.DataStream<OUT> coGroup(org.apache.flink.streaming.api.datastream.DataStream<IN1> input1, org.apache.flink.streaming.api.datastream.DataStream<IN2> input2, org.apache.flink.api.java.functions.KeySelector<IN1,KEY> keySelector1, org.apache.flink.api.java.functions.KeySelector<IN2,KEY> keySelector2, org.apache.flink.api.common.typeinfo.TypeInformation<OUT> outTypeInformation, org.apache.flink.api.common.functions.CoGroupFunction<IN1,IN2,OUT> func)
DataStreams
into one
DataStream. It groups each DataStream individually on a key and gives groups of both
DataStreams with equal keys together into a CoGroupFunction
. If a DataStream has a group with no
matching key in the other DataStream, the CoGroupFunction is called with an empty group for
the non-existing group.
The CoGroupFunction can iterate over the elements of both groups and return any number of elements including none.
NOTE: This method assumes both inputs are bounded.
IN1
- The class type of the first input.IN2
- The class type of the second input.KEY
- The class type of the key.OUT
- The class type of the output values.input1
- The first data stream.input2
- The second data stream.keySelector1
- The KeySelector to be used for extracting the first input's key for
partitioning.keySelector2
- The KeySelector to be used for extracting the second input's key for
partitioning.outTypeInformation
- The type information describing the output type.func
- The user-defined co-group function.public static <T> org.apache.flink.streaming.api.datastream.DataStream<T[]> generateBatchData(org.apache.flink.streaming.api.datastream.DataStream<T> inputData, int downStreamParallelism, int batchSize)
Copyright © 2019–2023 The Apache Software Foundation. All rights reserved.