OUT
- The output type of the operator.@Internal public class SourceOperator<OUT,SplitT extends SourceSplit> extends AbstractStreamOperator<OUT> implements OperatorEventHandler, PushingAsyncDataInput<OUT>, TimestampsAndWatermarks.WatermarkUpdateListener
PushingAsyncDataInput
which is naturally compatible with one
input processing in runtime stack.
Important Note on Serialization: The SourceOperator inherits the Serializable
interface from the StreamOperator, but is in fact NOT serializable. The
operator must only be instantiated in the StreamTask from its factory.
PushingAsyncDataInput.DataOutput<T>
AvailabilityProvider.AvailabilityHelper
chainingStrategy, config, latencyStats, LOG, metrics, output, processingTimeService
AVAILABLE
Constructor and Description |
---|
SourceOperator(FunctionWithException<SourceReaderContext,SourceReader<OUT,SplitT>,Exception> readerFactory,
OperatorEventGateway operatorEventGateway,
SimpleVersionedSerializer<SplitT> splitSerializer,
WatermarkStrategy<OUT> watermarkStrategy,
ProcessingTimeService timeService,
Configuration configuration,
String localHostname,
boolean emitProgressiveWatermarks,
StreamTask.CanEmitBatchOfRecordsChecker canEmitBatchOfRecords) |
Modifier and Type | Method and Description |
---|---|
void |
close()
This method is called at the very end of the operator's life, both in the case of a
successful completion of the operation, and in the case of a failure and canceling.
|
DataInputStatus |
emitNext(PushingAsyncDataInput.DataOutput<OUT> output)
Pushes elements to the output from current data input, and returns the input status to
indicate whether there are more available data in current input.
|
void |
finish()
This method is called at the end of data processing.
|
CompletableFuture<?> |
getAvailableFuture() |
InternalSourceReaderMetricGroup |
getSourceMetricGroup() |
SourceReader<OUT,SplitT> |
getSourceReader() |
void |
handleOperatorEvent(OperatorEvent event) |
void |
initializeState(StateInitializationContext context)
Stream operators with state which can be restored need to override this hook method.
|
void |
initReader()
Initializes the reader.
|
protected void |
initSourceMetricGroup() |
void |
notifyCheckpointAborted(long checkpointId)
This method is called as a notification once a distributed checkpoint has been aborted.
|
void |
notifyCheckpointComplete(long checkpointId)
Notifies the listener that the checkpoint with the given
checkpointId completed and
was committed. |
void |
open()
This method is called immediately before any elements are processed, it should contain the
operator's initialization logic, e.g.
|
void |
setup(StreamTask<?,?> containingTask,
StreamConfig config,
Output<StreamRecord<OUT>> output)
Initializes the operator.
|
void |
snapshotState(StateSnapshotContext context)
Stream operators with state, which want to participate in a snapshot need to override this
hook method.
|
CompletableFuture<Void> |
stop(StopMode mode) |
void |
updateCurrentEffectiveWatermark(long watermark)
Update the effective watermark.
|
void |
updateCurrentSplitWatermark(String splitId,
long watermark)
Notifies about changes to per split watermarks.
|
void |
updateIdle(boolean isIdle)
It should be called once the idle is changed.
|
getChainingStrategy, getContainingTask, getCurrentKey, getExecutionConfig, getInternalTimerService, getKeyedStateBackend, getKeyedStateStore, getMetricGroup, getOperatorConfig, getOperatorID, getOperatorName, getOperatorStateBackend, getOrCreateKeyedState, getPartitionedState, getPartitionedState, getProcessingTimeService, getRuntimeContext, getTimeServiceManager, getUserCodeClassloader, hasKeyContext1, hasKeyContext2, initializeState, isUsingCustomRawKeyedState, prepareSnapshotPreBarrier, processLatencyMarker, processLatencyMarker1, processLatencyMarker2, processWatermark, processWatermark1, processWatermark2, processWatermarkStatus, processWatermarkStatus1, processWatermarkStatus2, reportOrForwardLatencyMarker, setChainingStrategy, setCurrentKey, setKeyContextElement1, setKeyContextElement2, setProcessingTimeService, snapshotState
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
and, isApproximatelyAvailable, isAvailable, or
hasKeyContext
public SourceOperator(FunctionWithException<SourceReaderContext,SourceReader<OUT,SplitT>,Exception> readerFactory, OperatorEventGateway operatorEventGateway, SimpleVersionedSerializer<SplitT> splitSerializer, WatermarkStrategy<OUT> watermarkStrategy, ProcessingTimeService timeService, Configuration configuration, String localHostname, boolean emitProgressiveWatermarks, StreamTask.CanEmitBatchOfRecordsChecker canEmitBatchOfRecords)
public void setup(StreamTask<?,?> containingTask, StreamConfig config, Output<StreamRecord<OUT>> output)
SetupableStreamOperator
setup
in interface SetupableStreamOperator<OUT>
setup
in class AbstractStreamOperator<OUT>
@VisibleForTesting protected void initSourceMetricGroup()
public void initReader() throws Exception
Calling this method explicitly is an optional way to have the reader initialization a bit
earlier than in open(), as needed by the SourceOperatorStreamTask
This code should move to the constructor once the metric groups are available at task setup time.
Exception
public InternalSourceReaderMetricGroup getSourceMetricGroup()
public void open() throws Exception
AbstractStreamOperator
The default implementation does nothing.
open
in interface StreamOperator<OUT>
open
in class AbstractStreamOperator<OUT>
Exception
- An exception in this method causes the operator to fail.public void finish() throws Exception
StreamOperator
The method is expected to flush all remaining buffered data. Exceptions during this flushing of buffered data should be propagated, in order to cause the operation to be recognized as failed, because the last data items are not processed properly.
After this method is called, no more records can be produced for the downstream operators.
WARNING: It is not safe to use this method to commit any transactions or other side
effects! You can use this method to flush any buffered data that can later on be committed
e.g. in a CheckpointListener.notifyCheckpointComplete(long)
.
NOTE:This method does not need to close any resources. You should release external
resources in the StreamOperator.close()
method.
finish
in interface StreamOperator<OUT>
finish
in class AbstractStreamOperator<OUT>
Exception
- An exception in this method causes the operator to fail.public CompletableFuture<Void> stop(StopMode mode)
public void close() throws Exception
StreamOperator
This method is expected to make a thorough effort to release all resources that the operator has acquired.
NOTE:It can not emit any records! If you need to emit records at the end of
processing, do so in the StreamOperator.finish()
method.
close
in interface StreamOperator<OUT>
close
in class AbstractStreamOperator<OUT>
Exception
public DataInputStatus emitNext(PushingAsyncDataInput.DataOutput<OUT> output) throws Exception
PushingAsyncDataInput
This method should be non blocking.
emitNext
in interface PushingAsyncDataInput<OUT>
Exception
public void snapshotState(StateSnapshotContext context) throws Exception
AbstractStreamOperator
snapshotState
in interface StreamOperatorStateHandler.CheckpointedStreamOperator
snapshotState
in class AbstractStreamOperator<OUT>
context
- context that provides information and means required for taking a snapshotException
public CompletableFuture<?> getAvailableFuture()
getAvailableFuture
in interface AvailabilityProvider
public void initializeState(StateInitializationContext context) throws Exception
AbstractStreamOperator
initializeState
in interface StreamOperatorStateHandler.CheckpointedStreamOperator
initializeState
in class AbstractStreamOperator<OUT>
context
- context that allows to register different states.Exception
public void notifyCheckpointComplete(long checkpointId) throws Exception
CheckpointListener
checkpointId
completed and
was committed.
These notifications are "best effort", meaning they can sometimes be skipped. To behave
properly, implementers need to follow the "Checkpoint Subsuming Contract". Please see the
class-level JavaDocs
for details.
Please note that checkpoints may generally overlap, so you cannot assume that the notifyCheckpointComplete()
call is always for the latest prior checkpoint (or snapshot) that
was taken on the function/operator implementing this interface. It might be for a checkpoint
that was triggered earlier. Implementing the "Checkpoint Subsuming Contract" (see above)
properly handles this situation correctly as well.
Please note that throwing exceptions from this method will not cause the completed checkpoint to be revoked. Throwing exceptions will typically cause task/job failure and trigger recovery.
notifyCheckpointComplete
in interface CheckpointListener
notifyCheckpointComplete
in class AbstractStreamOperator<OUT>
checkpointId
- The ID of the checkpoint that has been completed.Exception
- This method can propagate exceptions, which leads to a failure/recovery for
the task. Note that this will NOT lead to the checkpoint being revoked.public void notifyCheckpointAborted(long checkpointId) throws Exception
CheckpointListener
Important: The fact that a checkpoint has been aborted does NOT mean that the data
and artifacts produced between the previous checkpoint and the aborted checkpoint are to be
discarded. The expected behavior is as if this checkpoint was never triggered in the first
place, and the next successful checkpoint simply covers a longer time span. See the
"Checkpoint Subsuming Contract" in the class-level JavaDocs
for
details.
These notifications are "best effort", meaning they can sometimes be skipped.
This method is very rarely necessary to implement. The "best effort" guarantee, together with the fact that this method should not result in discarding any data (per the "Checkpoint Subsuming Contract") means it is mainly useful for earlier cleanups of auxiliary resources. One example is to pro-actively clear a local per-checkpoint state cache upon checkpoint failure.
notifyCheckpointAborted
in interface CheckpointListener
notifyCheckpointAborted
in class AbstractStreamOperator<OUT>
checkpointId
- The ID of the checkpoint that has been aborted.Exception
- This method can propagate exceptions, which leads to a failure/recovery for
the task or job.public void handleOperatorEvent(OperatorEvent event)
handleOperatorEvent
in interface OperatorEventHandler
public void updateIdle(boolean isIdle)
TimestampsAndWatermarks.WatermarkUpdateListener
updateIdle
in interface TimestampsAndWatermarks.WatermarkUpdateListener
public void updateCurrentEffectiveWatermark(long watermark)
TimestampsAndWatermarks.WatermarkUpdateListener
this#updateIdle
instead of update the watermark to Long.MAX_VALUE
. Because the
output needs to distinguish between idle and real watermark.updateCurrentEffectiveWatermark
in interface TimestampsAndWatermarks.WatermarkUpdateListener
public void updateCurrentSplitWatermark(String splitId, long watermark)
TimestampsAndWatermarks.WatermarkUpdateListener
updateCurrentSplitWatermark
in interface TimestampsAndWatermarks.WatermarkUpdateListener
@VisibleForTesting public SourceReader<OUT,SplitT> getSourceReader()
Copyright © 2014–2024 The Apache Software Foundation. All rights reserved.