OperatorCoordinator (flink 1.11-SNAPSHOT API)

All Superinterfaces:

AutoCloseable, CheckpointListener

All Known Implementing Classes:

CollectSinkOperatorCoordinator, OperatorCoordinatorHolder, RecreateOnResetOperatorCoordinator, SourceCoordinator
```
public interface OperatorCoordinator
extends CheckpointListener, AutoCloseable
```
A coordinator for runtime operators. The OperatorCoordinator runs on the master, associated with the job vertex of the operator. It communicated with operators via sending operator events.
Operator coordinators are for example source and sink coordinators that discover and assign work, or aggregate and commit metadata.
Thread Model

All coordinator methods are called by the Job Manager's main thread (mailbox thread). That means that these methods must not, under any circumstances, perform blocking operations (like I/O or waiting on locks or futures). That would run a high risk of bringing down the entire JobManager.
Coordinators that involve more complex operations should hence spawn threads to handle the I/O work. The methods on the OperatorCoordinator.Context are safe to be called from another thread than the thread that calls the Coordinator's methods.

Nested Class Summary

Nested Classes
Modifier and Type	Interface and Description
`static interface`	`OperatorCoordinator.Context` The context gives the OperatorCoordinator access to contextual information and provides a gateway to interact with other components, such as sending operator events.
`static interface`	`OperatorCoordinator.Provider` The provider creates an OperatorCoordinator and takes a `OperatorCoordinator.Context` to pass to the OperatorCoordinator.

Field Summary

Fields
Modifier and Type Field and Description

static long NO_CHECKPOINT
The checkpoint ID passed to the restore methods when no completed checkpoint exists, yet.

Fields
Modifier and Type	Field and Description
`static long`	`NO_CHECKPOINT` The checkpoint ID passed to the restore methods when no completed checkpoint exists, yet.

Method Summary

All Methods Instance Methods Abstract Methods Default Methods
Modifier and Type	Method and Description
`void`	`checkpointCoordinator(long checkpointId, CompletableFuture<byte[]> resultFuture)` Takes a checkpoint of the coordinator.
`void`	`close()` This method is called when the coordinator is disposed.
`void`	`handleEventFromOperator(int subtask, OperatorEvent event)` Hands an OperatorEvent from a task (on the Task Manager) to this coordinator.
`default void`	`notifyCheckpointAborted(long checkpointId)` We override the method here to remove the checked exception.
`void`	`notifyCheckpointComplete(long checkpointId)` We override the method here to remove the checked exception.
`void`	`resetToCheckpoint(long checkpointId, byte[] checkpointData)` Resets the coordinator to the given checkpoint.
`void`	`start()` Starts the coordinator.
`void`	`subtaskFailed(int subtask, Throwable reason)` Called when one of the subtasks of the task running the coordinated operator goes through a failover (failure / recovery cycle).
`void`	`subtaskReset(int subtask, long checkpointId)` Called if a task is recovered as part of a partial failover, meaning a failover handled by the scheduler's failover strategy (by default recovering a pipelined region).

- Field Detail
  - NO_CHECKPOINT
```
static final long NO_CHECKPOINT
```
    The checkpoint ID passed to the restore methods when no completed checkpoint exists, yet. It indicates that the restore is to the "initial state" of the coordinator or the failed subtask.
    
    See Also:
    
    Constant Field Values
- Method Detail
  - start
```
void start()
    throws Exception
```
    Starts the coordinator. This method is called once at the beginning, before any other methods.
    
    Throws:
    
    Exception - Any exception thrown from this method causes a full job failure.
  - close
```
void close()
    throws Exception
```
    This method is called when the coordinator is disposed. This method should release currently held resources. Exceptions in this method do not cause the job to fail.
    
    Specified by:
    
    close in interface AutoCloseable
    
    Throws:
    
    Exception
  - handleEventFromOperator
```
void handleEventFromOperator(int subtask,
                             OperatorEvent event)
                      throws Exception
```
    Hands an OperatorEvent from a task (on the Task Manager) to this coordinator.
    
    Throws:
    
    Exception - Any exception thrown by this method results in a full job failure and recovery.
  - checkpointCoordinator
```
void checkpointCoordinator(long checkpointId,
                           CompletableFuture<byte[]> resultFuture)
                    throws Exception
```
    Takes a checkpoint of the coordinator. The checkpoint is identified by the given ID.
    To confirm the checkpoint and store state in it, the given CompletableFuture must be completed with the state. To abort or dis-confirm the checkpoint, the given CompletableFuture must be completed exceptionally. In any case, the given CompletableFuture must be completed in some way, otherwise the checkpoint will not progress.
    Exactly-once Semantics
    
    The semantics are defined as follows:
    - The point in time when the checkpoint future is completed is considered the point in time when the coordinator's checkpoint takes place.
    - The OperatorCoordinator implementation must have a way of strictly ordering the sending of events and the completion of the checkpoint future (for example the same thread does both actions, or both actions are guarded by a mutex).
    - Every event sent before the checkpoint future is completed is considered before the checkpoint.
    - Every event sent after the checkpoint future is completed is considered to be after the checkpoint.
    Throws:
    
    Exception
  - notifyCheckpointComplete
```
void notifyCheckpointComplete(long checkpointId)
```
    We override the method here to remove the checked exception. Please check the Java docs of CheckpointListener.notifyCheckpointComplete(long) for more detail semantic of the method.
    
    Specified by:
    
    notifyCheckpointComplete in interface CheckpointListener
    
    Parameters:
    
    checkpointId - The ID of the checkpoint that has been completed.
  - notifyCheckpointAborted
```
default void notifyCheckpointAborted(long checkpointId)
```
    We override the method here to remove the checked exception. Please check the Java docs of CheckpointListener.notifyCheckpointAborted(long) for more detail semantic of the method.
    
    Specified by:
    
    notifyCheckpointAborted in interface CheckpointListener
    
    Parameters:
    
    checkpointId - The ID of the checkpoint that has been aborted.
  - resetToCheckpoint
```
void resetToCheckpoint(long checkpointId,
                       @Nullable
                       byte[] checkpointData)
                throws Exception
```
    Resets the coordinator to the given checkpoint. When this method is called, the coordinator can discard all other in-flight working state. All subtasks will also have been reset to the same checkpoint.
    This method is called in the case of a global failover of the system, which means a failover of the coordinator (JobManager). This method is not invoked on a partial failover; partial failovers call the subtaskReset(int, long) method for the involved subtasks.
    This method is expected to behave synchronously with respect to other method calls and calls to Context methods. For example, Events being sent by the Coordinator after this method returns are assumed to take place after the checkpoint that was restored.
    This method is called with a null state argument in the following situations:
    - There is a recovery and there was no completed checkpoint yet.
    - There is a recovery from a completed checkpoint/savepoint but it contained no state for the coordinator.
    In both cases, the coordinator should reset to an empty (new) state.
    Restoring implicitly notifies of Checkpoint Completion
    
    Restoring to a checkpoint is a way of confirming that the checkpoint is complete. It is safe to commit side-effects that are predicated on checkpoint completion after this call.
    Even if no call to notifyCheckpointComplete(long) happened, the checkpoint can still be complete (for example when a system failure happened directly after committing the checkpoint, before calling the notifyCheckpointComplete(long) method).
    Throws:
    
    Exception
  - subtaskFailed
```
void subtaskFailed(int subtask,
                   @Nullable
                   Throwable reason)
```
    Called when one of the subtasks of the task running the coordinated operator goes through a failover (failure / recovery cycle).
    This method is called every time there is a failover of a subtasks, regardless of whether there it is a partial failover or a global failover.
  - subtaskReset
```
void subtaskReset(int subtask,
                  long checkpointId)
```
    Called if a task is recovered as part of a partial failover, meaning a failover handled by the scheduler's failover strategy (by default recovering a pipelined region). The method is invoked for each subtask involved in that partial failover.
    In contrast to this method, the resetToCheckpoint(long, byte[]) method is called in the case of a global failover, which is the case when the coordinator (JobManager) is recovered.

Back to Flink Website

Interface OperatorCoordinator

Thread Model

Nested Class Summary

Field Summary

Method Summary

Field Detail

NO_CHECKPOINT

Method Detail

start

close

handleEventFromOperator

checkpointCoordinator

Exactly-once Semantics

notifyCheckpointComplete

notifyCheckpointAborted

resetToCheckpoint

Restoring implicitly notifies of Checkpoint Completion

subtaskFailed

subtaskReset