Class BatchCompactOperator<T>
- java.lang.Object
-
- org.apache.flink.streaming.api.operators.AbstractStreamOperator<CompactMessages.CompactOutput>
-
- org.apache.flink.connector.file.table.batch.compact.BatchCompactOperator<T>
-
- All Implemented Interfaces:
Serializable
,CheckpointListener
,BoundedOneInput
,Input<CompactMessages.CoordinatorOutput>
,KeyContext
,KeyContextHandler
,OneInputStreamOperator<CompactMessages.CoordinatorOutput,CompactMessages.CompactOutput>
,StreamOperator<CompactMessages.CompactOutput>
,StreamOperatorStateHandler.CheckpointedStreamOperator
,YieldingOperator<CompactMessages.CompactOutput>
public class BatchCompactOperator<T> extends AbstractStreamOperator<CompactMessages.CompactOutput> implements OneInputStreamOperator<CompactMessages.CoordinatorOutput,CompactMessages.CompactOutput>, BoundedOneInput
CompactOperator for compaction in batch mode. It will compact files to a target file and then emit the compacted file's path to downstream operator. The main logic is similar toCompactOperator
but skip some unnecessary operations in batch mode.Note: if the size of the files to be compacted is 1, this operator won't do anything and just emit the file to downstream. Also, the name of the files to be compacted is not a hidden file, it's expected these files are in hidden or temporary directory. Please make sure it. This assumption can help skip rename hidden file.
- See Also:
- Serialized Form
-
-
Field Summary
Fields Modifier and Type Field Description static String
ATTEMPT_PREFIX
static String
COMPACTED_PREFIX
static String
UNCOMPACTED_PREFIX
-
Fields inherited from class org.apache.flink.streaming.api.operators.AbstractStreamOperator
config, lastRecordAttributes1, lastRecordAttributes2, latencyStats, LOG, metrics, output, processingTimeService, stateHandler, stateKeySelector1, stateKeySelector2, timeServiceManager
-
-
Constructor Summary
Constructors Constructor Description BatchCompactOperator(SupplierWithException<FileSystem,IOException> fsFactory, CompactReader.Factory<T> readerFactory, CompactWriter.Factory<T> writerFactory)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description void
close()
This method is called at the very end of the operator's life, both in the case of a successful completion of the operation, and in the case of a failure and canceling.static Path
convertFromUncompacted(Path path)
void
endInput()
It is notified that no more data will arrive from the input.void
open()
This method is called immediately before any elements are processed, it should contain the operator's initialization logic, e.g. state initialization.void
processElement(StreamRecord<CompactMessages.CoordinatorOutput> element)
Processes one element that arrived on this input of theMultipleInputStreamOperator
.-
Methods inherited from class org.apache.flink.streaming.api.operators.AbstractStreamOperator
finish, getContainingTask, getCurrentKey, getExecutionConfig, getInternalTimerService, getKeyedStateBackend, getKeyedStateStore, getMetricGroup, getOperatorConfig, getOperatorID, getOperatorName, getOperatorStateBackend, getOrCreateKeyedState, getPartitionedState, getPartitionedState, getProcessingTimeService, getRuntimeContext, getStateKeySelector1, getStateKeySelector2, getTimeServiceManager, getUserCodeClassloader, hasKeyContext1, hasKeyContext2, initializeState, initializeState, isUsingCustomRawKeyedState, notifyCheckpointAborted, notifyCheckpointComplete, prepareSnapshotPreBarrier, processLatencyMarker, processLatencyMarker1, processLatencyMarker2, processRecordAttributes, processRecordAttributes1, processRecordAttributes2, processWatermark, processWatermark1, processWatermark2, processWatermarkStatus, processWatermarkStatus1, processWatermarkStatus2, reportOrForwardLatencyMarker, setCurrentKey, setKeyContextElement1, setKeyContextElement2, setMailboxExecutor, setProcessingTimeService, setup, snapshotState, snapshotState, useSplittableTimers
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface org.apache.flink.api.common.state.CheckpointListener
notifyCheckpointAborted, notifyCheckpointComplete
-
Methods inherited from interface org.apache.flink.streaming.api.operators.Input
processLatencyMarker, processRecordAttributes, processWatermark, processWatermarkStatus
-
Methods inherited from interface org.apache.flink.streaming.api.operators.KeyContext
getCurrentKey, setCurrentKey
-
Methods inherited from interface org.apache.flink.streaming.api.operators.KeyContextHandler
hasKeyContext
-
Methods inherited from interface org.apache.flink.streaming.api.operators.OneInputStreamOperator
setKeyContextElement
-
Methods inherited from interface org.apache.flink.streaming.api.operators.StreamOperator
finish, getMetricGroup, getOperatorAttributes, getOperatorID, initializeState, prepareSnapshotPreBarrier, setKeyContextElement1, setKeyContextElement2, snapshotState
-
-
-
-
Field Detail
-
UNCOMPACTED_PREFIX
public static final String UNCOMPACTED_PREFIX
- See Also:
- Constant Field Values
-
COMPACTED_PREFIX
public static final String COMPACTED_PREFIX
- See Also:
- Constant Field Values
-
ATTEMPT_PREFIX
public static final String ATTEMPT_PREFIX
- See Also:
- Constant Field Values
-
-
Constructor Detail
-
BatchCompactOperator
public BatchCompactOperator(SupplierWithException<FileSystem,IOException> fsFactory, CompactReader.Factory<T> readerFactory, CompactWriter.Factory<T> writerFactory)
-
-
Method Detail
-
open
public void open() throws Exception
Description copied from class:AbstractStreamOperator
This method is called immediately before any elements are processed, it should contain the operator's initialization logic, e.g. state initialization.The default implementation does nothing.
- Specified by:
open
in interfaceStreamOperator<T>
- Overrides:
open
in classAbstractStreamOperator<CompactMessages.CompactOutput>
- Throws:
Exception
- An exception in this method causes the operator to fail.
-
processElement
public void processElement(StreamRecord<CompactMessages.CoordinatorOutput> element) throws Exception
Description copied from interface:Input
Processes one element that arrived on this input of theMultipleInputStreamOperator
. This method is guaranteed to not be called concurrently with other methods of the operator.- Specified by:
processElement
in interfaceInput<T>
- Throws:
Exception
-
endInput
public void endInput() throws Exception
Description copied from interface:BoundedOneInput
It is notified that no more data will arrive from the input.WARNING: It is not safe to use this method to commit any transactions or other side effects! You can use this method to flush any buffered data that can later on be committed e.g. in a
CheckpointListener.notifyCheckpointComplete(long)
.NOTE: Given it is semantically very similar to the
StreamOperator.finish()
method. It might be dropped in favour of the other method at some point in time.- Specified by:
endInput
in interfaceBoundedOneInput
- Throws:
Exception
-
close
public void close() throws Exception
Description copied from interface:StreamOperator
This method is called at the very end of the operator's life, both in the case of a successful completion of the operation, and in the case of a failure and canceling.This method is expected to make a thorough effort to release all resources that the operator has acquired.
NOTE:It can not emit any records! If you need to emit records at the end of processing, do so in the
StreamOperator.finish()
method.- Specified by:
close
in interfaceStreamOperator<T>
- Overrides:
close
in classAbstractStreamOperator<CompactMessages.CompactOutput>
- Throws:
Exception
-
-