public abstract class HashJoinOperator extends TableStreamOperator<RowData> implements TwoInputStreamOperator<RowData,RowData,RowData>, BoundedMultiInput, InputSelectable
The join operator implements the logic of a join operator at runtime. It uses a
hybrid-hash-join internally to match the records with equal key. The build side of the hash is
the first input of the match. It support all join type in HashJoinType
.
Note: In order to solve the problem of data skew, or too much data in the hash table, the fallback to sort merge join mechanism is introduced here. If some partitions are spilled to disk more than three times in the process of hash join, it will fallback to sort merge join by default to improve stability. In the future, we will support more flexible adaptive hash join strategy, for example, in the process of building a hash table, if the size of data written to disk reaches a certain threshold, fallback to sort merge join in advance.
TableStreamOperator.ContextImpl
ctx, currentWatermark
chainingStrategy, config, lastRecordAttributes1, lastRecordAttributes2, latencyStats, metrics, output, processingTimeService, stateHandler, stateKeySelector1, stateKeySelector2, timeServiceManager
Modifier and Type | Method and Description |
---|---|
void |
close()
This method is called at the very end of the operator's life, both in the case of a
successful completion of the operation, and in the case of a failure and canceling.
|
void |
endInput(int inputId)
It is notified that no more data will arrive from the input identified by the
inputId . |
abstract void |
join(RowIterator<BinaryRowData> buildIter,
RowData probeRow) |
static HashJoinOperator |
newHashJoinOperator(HashJoinType type,
boolean leftIsBuild,
boolean compressionEnable,
int compressionBlockSize,
GeneratedJoinCondition condFuncCode,
boolean reverseJoinFunction,
boolean[] filterNullKeys,
GeneratedProjection buildProjectionCode,
GeneratedProjection probeProjectionCode,
boolean tryDistinctBuildRow,
int buildRowSize,
long buildRowCount,
long probeRowCount,
RowType keyType,
SortMergeJoinFunction sortMergeJoinFunction) |
InputSelection |
nextSelection()
Returns the next
InputSelection that wants to get the record. |
void |
open()
This method is called immediately before any elements are processed, it should contain the
operator's initialization logic, e.g. state initialization.
|
void |
processElement1(StreamRecord<RowData> element)
Processes one element that arrived on the first input of this two-input operator.
|
void |
processElement2(StreamRecord<RowData> element)
Processes one element that arrived on the second input of this two-input operator.
|
computeMemorySize, processWatermark, useSplittableTimers
finish, getChainingStrategy, getContainingTask, getCurrentKey, getExecutionConfig, getInternalTimerService, getKeyedStateBackend, getKeyedStateStore, getMetricGroup, getOperatorConfig, getOperatorID, getOperatorName, getOperatorStateBackend, getOrCreateKeyedState, getPartitionedState, getPartitionedState, getProcessingTimeService, getRuntimeContext, getStateKeySelector1, getStateKeySelector2, getTimeServiceManager, getUserCodeClassloader, hasKeyContext1, hasKeyContext2, initializeState, initializeState, isUsingCustomRawKeyedState, notifyCheckpointAborted, notifyCheckpointComplete, prepareSnapshotPreBarrier, processLatencyMarker, processLatencyMarker1, processLatencyMarker2, processRecordAttributes, processRecordAttributes1, processRecordAttributes2, processWatermark1, processWatermark2, processWatermarkStatus, processWatermarkStatus1, processWatermarkStatus2, reportOrForwardLatencyMarker, setChainingStrategy, setCurrentKey, setKeyContextElement1, setKeyContextElement2, setMailboxExecutor, setProcessingTimeService, setup, snapshotState, snapshotState
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
processLatencyMarker1, processLatencyMarker2, processRecordAttributes1, processRecordAttributes2, processWatermark1, processWatermark2, processWatermarkStatus1, processWatermarkStatus2
finish, getMetricGroup, getOperatorAttributes, getOperatorID, initializeState, prepareSnapshotPreBarrier, setKeyContextElement1, setKeyContextElement2, snapshotState
notifyCheckpointAborted, notifyCheckpointComplete
getCurrentKey, setCurrentKey
hasKeyContext
public void open() throws Exception
AbstractStreamOperator
The default implementation does nothing.
open
in interface StreamOperator<RowData>
open
in class TableStreamOperator<RowData>
Exception
- An exception in this method causes the operator to fail.public void processElement1(StreamRecord<RowData> element) throws Exception
TwoInputStreamOperator
processElement1
in interface TwoInputStreamOperator<RowData,RowData,RowData>
Exception
public void processElement2(StreamRecord<RowData> element) throws Exception
TwoInputStreamOperator
processElement2
in interface TwoInputStreamOperator<RowData,RowData,RowData>
Exception
public InputSelection nextSelection()
InputSelectable
InputSelection
that wants to get the record. This method is
guaranteed to not be called concurrently with other methods of the operator.nextSelection
in interface InputSelectable
public void endInput(int inputId) throws Exception
BoundedMultiInput
inputId
. The inputId
is numbered starting from 1, and `1` indicates the first input.
WARNING: It is not safe to use this method to commit any transactions or other side
effects! You can use this method to e.g. flush data buffered for the given input or implement
an ordered reading from multiple inputs via InputSelectable
.
endInput
in interface BoundedMultiInput
Exception
public abstract void join(RowIterator<BinaryRowData> buildIter, RowData probeRow) throws Exception
Exception
public void close() throws Exception
StreamOperator
This method is expected to make a thorough effort to release all resources that the operator has acquired.
NOTE:It can not emit any records! If you need to emit records at the end of
processing, do so in the StreamOperator.finish()
method.
close
in interface StreamOperator<RowData>
close
in class AbstractStreamOperator<RowData>
Exception
public static HashJoinOperator newHashJoinOperator(HashJoinType type, boolean leftIsBuild, boolean compressionEnable, int compressionBlockSize, GeneratedJoinCondition condFuncCode, boolean reverseJoinFunction, boolean[] filterNullKeys, GeneratedProjection buildProjectionCode, GeneratedProjection probeProjectionCode, boolean tryDistinctBuildRow, int buildRowSize, long buildRowCount, long probeRowCount, RowType keyType, SortMergeJoinFunction sortMergeJoinFunction)
Copyright © 2014–2024 The Apache Software Foundation. All rights reserved.