Class RecreateOnResetOperatorCoordinator

    • Method Detail

      • start

        public void start()
                   throws Exception
        Description copied from interface: OperatorCoordinator
        Starts the coordinator. This method is called once at the beginning, before any other methods.
        Specified by:
        start in interface OperatorCoordinator
        Throws:
        Exception - Any exception thrown from this method causes a full job failure.
      • handleEventFromOperator

        public void handleEventFromOperator​(int subtask,
                                            int attemptNumber,
                                            OperatorEvent event)
                                     throws Exception
        Description copied from interface: OperatorCoordinator
        Hands an OperatorEvent coming from a parallel Operator instance (one attempt of the parallel subtasks).
        Specified by:
        handleEventFromOperator in interface OperatorCoordinator
        Throws:
        Exception - Any exception thrown by this method results in a full job failure and recovery.
      • executionAttemptFailed

        public void executionAttemptFailed​(int subtask,
                                           int attemptNumber,
                                           @Nullable
                                           Throwable reason)
        Description copied from interface: OperatorCoordinator
        Called when any subtask execution attempt of the task running the coordinated operator is failed/canceled.

        This method is called every time an execution attempt is failed/canceled, regardless of whether there it is caused by a partial failover or a global failover.

        Specified by:
        executionAttemptFailed in interface OperatorCoordinator
      • subtaskReset

        public void subtaskReset​(int subtask,
                                 long checkpointId)
        Description copied from interface: OperatorCoordinator
        Called if a subtask is recovered as part of a partial failover, meaning a failover handled by the scheduler's failover strategy (by default recovering a pipelined region). The method is invoked for each subtask involved in that partial failover.

        In contrast to this method, the OperatorCoordinator.resetToCheckpoint(long, byte[]) method is called in the case of a global failover, which is the case when the coordinator (JobManager) is recovered.

        Note that this method will not be called if an execution attempt of a subtask failed, if the subtask is not entirely failed, i.e. if the subtask has other execution attempts that are not failed/canceled.

        Specified by:
        subtaskReset in interface OperatorCoordinator
      • executionAttemptReady

        public void executionAttemptReady​(int subtask,
                                          int attemptNumber,
                                          OperatorCoordinator.SubtaskGateway gateway)
        Description copied from interface: OperatorCoordinator
        This is called when a subtask execution attempt of the Operator becomes ready to receive events. The given SubtaskGateway can be used to send events to the execution attempt.

        The given SubtaskGateway is bound to that specific execution attempt that became ready. All events sent through the gateway target that execution attempt; if the attempt is no longer running by the time the event is sent, then the events are failed.

        Specified by:
        executionAttemptReady in interface OperatorCoordinator
      • checkpointCoordinator

        public void checkpointCoordinator​(long checkpointId,
                                          CompletableFuture<byte[]> resultFuture)
                                   throws Exception
        Description copied from interface: OperatorCoordinator
        Takes a checkpoint of the coordinator. The checkpoint is identified by the given ID.

        To confirm the checkpoint and store state in it, the given CompletableFuture must be completed with the state. To abort or dis-confirm the checkpoint, the given CompletableFuture must be completed exceptionally. In any case, the given CompletableFuture must be completed in some way, otherwise the checkpoint will not progress.

        Exactly-once Semantics

        The semantics are defined as follows:

        • The point in time when the checkpoint future is completed is considered the point in time when the coordinator's checkpoint takes place.
        • The OperatorCoordinator implementation must have a way of strictly ordering the sending of events and the completion of the checkpoint future (for example the same thread does both actions, or both actions are guarded by a mutex).
        • Every event sent before the checkpoint future is completed is considered before the checkpoint.
        • Every event sent after the checkpoint future is completed is considered to be after the checkpoint.
        Specified by:
        checkpointCoordinator in interface OperatorCoordinator
        Throws:
        Exception - Any exception thrown by this method results in a full job failure and recovery.
      • resetToCheckpoint

        public void resetToCheckpoint​(long checkpointId,
                                      @Nullable
                                      byte[] checkpointData)
        Description copied from interface: OperatorCoordinator
        Resets the coordinator to the given checkpoint. When this method is called, the coordinator can discard all other in-flight working state. All subtasks will also have been reset to the same checkpoint.

        This method is called in the case of a global failover of the system, which means a failover of the coordinator (JobManager). This method is not invoked on a partial failover; partial failovers call the OperatorCoordinator.subtaskReset(int, long) method for the involved subtasks.

        This method is expected to behave synchronously with respect to other method calls and calls to Context methods. For example, Events being sent by the Coordinator after this method returns are assumed to take place after the checkpoint that was restored.

        This method is called with a null state argument in the following situations:

        • There is a recovery and there was no completed checkpoint yet.
        • There is a recovery from a completed checkpoint/savepoint but it contained no state for the coordinator.

        In both cases, the coordinator should reset to an empty (new) state.

        Restoring implicitly notifies of Checkpoint Completion

        Restoring to a checkpoint is a way of confirming that the checkpoint is complete. It is safe to commit side-effects that are predicated on checkpoint completion after this call.

        Even if no call to OperatorCoordinator.notifyCheckpointComplete(long) happened, the checkpoint can still be complete (for example when a system failure happened directly after committing the checkpoint, before calling the OperatorCoordinator.notifyCheckpointComplete(long) method).

        Specified by:
        resetToCheckpoint in interface OperatorCoordinator