@Internal public class ContinuousFileReaderOperator<OUT,T extends TimestampedInputSplit> extends AbstractStreamOperator<OUT> implements OneInputStreamOperator<T,OUT>, OutputTypeConfigurable<OUT>
splits
received from the preceding
ContinuousFileMonitoringFunction
. Contrary to the ContinuousFileMonitoringFunction
which has a parallelism of 1, this operator can have DOP > 1.
This implementation uses MailboxExecutor
to execute each action and state machine
approach. The workflow is the following:
IDLE
OPENING
and enqueue a mail
to process
it
READING
, read one record, re-enqueue self
IDLE
On close:
IDLE
then close immediately
CLOSING
, call yield
in a loop until state is CLOSED
yield()
causes remaining records (and splits) to be
processed in the same way as above
Using MailboxExecutor
allows to avoid explicit synchronization. At most one mail
should be enqueued at any given time.
Using FSM approach allows to explicitly define states and enforce transitions
between them.
chainingStrategy, config, lastRecordAttributes1, lastRecordAttributes2, latencyStats, metrics, output, processingTimeService, stateHandler, stateKeySelector1, stateKeySelector2, timeServiceManager
Modifier and Type | Method and Description |
---|---|
void |
close()
This method is called at the very end of the operator's life, both in the case of a
successful completion of the operation, and in the case of a failure and canceling.
|
void |
finish()
This method is called at the end of data processing.
|
void |
initializeState(StateInitializationContext context)
Stream operators with state which can be restored need to override this hook method.
|
void |
open()
This method is called immediately before any elements are processed, it should contain the
operator's initialization logic, e.g. state initialization.
|
void |
processElement(StreamRecord<T> element)
Processes one element that arrived on this input of the
MultipleInputStreamOperator . |
void |
processWatermark(Watermark mark)
Processes a
Watermark that arrived on the first input of this two-input operator. |
void |
setOutputType(TypeInformation<OUT> outTypeInfo,
ExecutionConfig executionConfig)
Is called by the
org.apache.flink.streaming.api.graph.StreamGraph#addOperator(Integer,
String, StreamOperator, TypeInformation, TypeInformation, String) method when the org.apache.flink.streaming.api.graph.StreamGraph is generated. |
void |
snapshotState(StateSnapshotContext context)
Stream operators with state, which want to participate in a snapshot need to override this
hook method.
|
getChainingStrategy, getContainingTask, getCurrentKey, getExecutionConfig, getInternalTimerService, getKeyedStateBackend, getKeyedStateStore, getMetricGroup, getOperatorConfig, getOperatorID, getOperatorName, getOperatorStateBackend, getOrCreateKeyedState, getPartitionedState, getPartitionedState, getProcessingTimeService, getRuntimeContext, getStateKeySelector1, getStateKeySelector2, getTimeServiceManager, getUserCodeClassloader, hasKeyContext1, hasKeyContext2, initializeState, isUsingCustomRawKeyedState, notifyCheckpointAborted, notifyCheckpointComplete, prepareSnapshotPreBarrier, processLatencyMarker, processLatencyMarker1, processLatencyMarker2, processRecordAttributes, processRecordAttributes1, processRecordAttributes2, processWatermark1, processWatermark2, processWatermarkStatus, processWatermarkStatus1, processWatermarkStatus2, reportOrForwardLatencyMarker, setChainingStrategy, setCurrentKey, setKeyContextElement1, setKeyContextElement2, setMailboxExecutor, setProcessingTimeService, setup, snapshotState, useSplittableTimers
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
setKeyContextElement
getMetricGroup, getOperatorAttributes, getOperatorID, initializeState, prepareSnapshotPreBarrier, setKeyContextElement1, setKeyContextElement2, snapshotState
notifyCheckpointAborted, notifyCheckpointComplete
getCurrentKey, setCurrentKey
processLatencyMarker, processRecordAttributes, processWatermarkStatus
hasKeyContext
public void initializeState(StateInitializationContext context) throws Exception
AbstractStreamOperator
initializeState
in interface StreamOperatorStateHandler.CheckpointedStreamOperator
initializeState
in class AbstractStreamOperator<OUT>
context
- context that allows to register different states.Exception
public void open() throws Exception
AbstractStreamOperator
The default implementation does nothing.
open
in interface StreamOperator<OUT>
open
in class AbstractStreamOperator<OUT>
Exception
- An exception in this method causes the operator to fail.public void processElement(StreamRecord<T> element) throws Exception
Input
MultipleInputStreamOperator
.
This method is guaranteed to not be called concurrently with other methods of the operator.processElement
in interface Input<T extends TimestampedInputSplit>
Exception
public void processWatermark(Watermark mark) throws Exception
Input
Watermark
that arrived on the first input of this two-input operator.
This method is guaranteed to not be called concurrently with other methods of the operator.processWatermark
in interface Input<T extends TimestampedInputSplit>
processWatermark
in class AbstractStreamOperator<OUT>
Exception
Watermark
public void finish() throws Exception
StreamOperator
The method is expected to flush all remaining buffered data. Exceptions during this flushing of buffered data should be propagated, in order to cause the operation to be recognized as failed, because the last data items are not processed properly.
After this method is called, no more records can be produced for the downstream operators.
WARNING: It is not safe to use this method to commit any transactions or other side
effects! You can use this method to flush any buffered data that can later on be committed
e.g. in a CheckpointListener.notifyCheckpointComplete(long)
.
NOTE:This method does not need to close any resources. You should release external
resources in the StreamOperator.close()
method.
finish
in interface StreamOperator<OUT>
finish
in class AbstractStreamOperator<OUT>
Exception
- An exception in this method causes the operator to fail.public void close() throws Exception
StreamOperator
This method is expected to make a thorough effort to release all resources that the operator has acquired.
NOTE:It can not emit any records! If you need to emit records at the end of
processing, do so in the StreamOperator.finish()
method.
close
in interface StreamOperator<OUT>
close
in class AbstractStreamOperator<OUT>
Exception
public void snapshotState(StateSnapshotContext context) throws Exception
AbstractStreamOperator
snapshotState
in interface StreamOperatorStateHandler.CheckpointedStreamOperator
snapshotState
in class AbstractStreamOperator<OUT>
context
- context that provides information and means required for taking a snapshotException
public void setOutputType(TypeInformation<OUT> outTypeInfo, ExecutionConfig executionConfig)
OutputTypeConfigurable
org.apache.flink.streaming.api.graph.StreamGraph#addOperator(Integer,
String, StreamOperator, TypeInformation, TypeInformation, String)
method when the org.apache.flink.streaming.api.graph.StreamGraph
is generated. The method is called with the
output TypeInformation
which is also used for the org.apache.flink.streaming.runtime.tasks.StreamTask
output serializer.setOutputType
in interface OutputTypeConfigurable<OUT>
outTypeInfo
- Output type information of the org.apache.flink.streaming.runtime.tasks.StreamTask
executionConfig
- Execution configurationCopyright © 2014–2024 The Apache Software Foundation. All rights reserved.