@Internal public class BatchArrowPythonGroupAggregateFunctionOperator extends AbstractArrowPythonAggregateFunctionOperator
AggregateFunction
Operator for Group Aggregation.arrowSerializer, currentBatchCount, pandasAggFunctions, reuseJoinedRow, rowDataWrapper
bais, baisWrapper, baos, baosWrapper, forwardedInputQueue, inputType, udfInputType, udfOutputType
pythonFunctionRunner
bundleFinishedCallback, config, elementCount, lastFinishBundleTime, maxBundleSize, systemEnvEnabled
chainingStrategy, latencyStats, LOG, metrics, output, processingTimeService
Constructor and Description |
---|
BatchArrowPythonGroupAggregateFunctionOperator(Configuration config,
PythonFunctionInfo[] pandasAggFunctions,
RowType inputType,
RowType udfInputType,
RowType udfOutputType,
GeneratedProjection inputGeneratedProjection,
GeneratedProjection groupKeyGeneratedProjection,
GeneratedProjection groupSetGeneratedProjection) |
Modifier and Type | Method and Description |
---|---|
void |
bufferInput(RowData input)
Buffers the specified input, it will be used to construct the operator result together with
the user-defined function execution result.
|
void |
emitResult(Tuple3<String,byte[],Integer> resultTuple)
Sends the execution result to the downstream operator.
|
void |
endInput()
It is notified that no more data will arrive from the input.
|
void |
finish()
This method is called at the end of data processing.
|
protected void |
invokeCurrentBatch() |
void |
open()
This method is called immediately before any elements are processed, it should contain the
operator's initialization logic, e.g.
|
void |
processElementInternal(RowData value) |
close, createInputCoderInfoDescriptor, createOutputCoderInfoDescriptor, createUserDefinedFunctionsProto, getFunctionInput, getFunctionUrn, getPythonEnv, isBundleFinished, processElement
createPythonFunctionRunner
createPythonEnvironmentManager, emitResults, invokeFinishBundle
checkInvokeFinishBundleByCount, getConfiguration, getFlinkMetricContainer, prepareSnapshotPreBarrier, processWatermark, setCurrentKey
getChainingStrategy, getContainingTask, getCurrentKey, getExecutionConfig, getInternalTimerService, getKeyedStateBackend, getKeyedStateStore, getMetricGroup, getOperatorConfig, getOperatorID, getOperatorName, getOperatorStateBackend, getOrCreateKeyedState, getPartitionedState, getPartitionedState, getProcessingTimeService, getRuntimeContext, getTimeServiceManager, getUserCodeClassloader, hasKeyContext1, hasKeyContext2, initializeState, initializeState, isUsingCustomRawKeyedState, notifyCheckpointAborted, notifyCheckpointComplete, processLatencyMarker, processLatencyMarker1, processLatencyMarker2, processWatermark1, processWatermark2, processWatermarkStatus, processWatermarkStatus1, processWatermarkStatus2, reportOrForwardLatencyMarker, setChainingStrategy, setKeyContextElement1, setKeyContextElement2, setProcessingTimeService, setup, snapshotState, snapshotState
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
setKeyContextElement
getMetricGroup, getOperatorID, initializeState, prepareSnapshotPreBarrier, setKeyContextElement1, setKeyContextElement2, snapshotState
notifyCheckpointAborted, notifyCheckpointComplete
getCurrentKey, setCurrentKey
processLatencyMarker, processWatermark, processWatermarkStatus
hasKeyContext
public BatchArrowPythonGroupAggregateFunctionOperator(Configuration config, PythonFunctionInfo[] pandasAggFunctions, RowType inputType, RowType udfInputType, RowType udfOutputType, GeneratedProjection inputGeneratedProjection, GeneratedProjection groupKeyGeneratedProjection, GeneratedProjection groupSetGeneratedProjection)
public void bufferInput(RowData input) throws Exception
AbstractStatelessFunctionOperator
bufferInput
in class AbstractStatelessFunctionOperator<RowData,RowData,RowData>
Exception
public void processElementInternal(RowData value)
processElementInternal
in class AbstractStatelessFunctionOperator<RowData,RowData,RowData>
public void emitResult(Tuple3<String,byte[],Integer> resultTuple) throws Exception
AbstractExternalPythonFunctionOperator
emitResult
in class AbstractExternalPythonFunctionOperator<RowData>
Exception
public void open() throws Exception
AbstractStreamOperator
The default implementation does nothing.
open
in interface StreamOperator<RowData>
open
in class AbstractArrowPythonAggregateFunctionOperator
Exception
- An exception in this method causes the operator to fail.public void endInput() throws Exception
BoundedOneInput
WARNING: It is not safe to use this method to commit any transactions or other side
effects! You can use this method to flush any buffered data that can later on be committed
e.g. in a CheckpointListener.notifyCheckpointComplete(long)
.
NOTE: Given it is semantically very similar to the StreamOperator.finish()
method. It might be dropped in favour of the other method at some point in time.
endInput
in interface BoundedOneInput
endInput
in class AbstractOneInputPythonFunctionOperator<RowData,RowData>
Exception
public void finish() throws Exception
StreamOperator
The method is expected to flush all remaining buffered data. Exceptions during this flushing of buffered data should be propagated, in order to cause the operation to be recognized as failed, because the last data items are not processed properly.
After this method is called, no more records can be produced for the downstream operators.
WARNING: It is not safe to use this method to commit any transactions or other side
effects! You can use this method to flush any buffered data that can later on be committed
e.g. in a CheckpointListener.notifyCheckpointComplete(long)
.
NOTE:This method does not need to close any resources. You should release external
resources in the StreamOperator.close()
method.
finish
in interface StreamOperator<RowData>
finish
in class AbstractPythonFunctionOperator<RowData>
Exception
- An exception in this method causes the operator to fail.Copyright © 2014–2024 The Apache Software Foundation. All rights reserved.