Class CheckpointFailureManager
- java.lang.Object
-
- org.apache.flink.runtime.checkpoint.CheckpointFailureManager
-
public class CheckpointFailureManager extends Object
The checkpoint failure manager which centralized manage checkpoint failure processing logic.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static interface
CheckpointFailureManager.FailJobCallback
A callback interface about how to fail a job.
-
Field Summary
Fields Modifier and Type Field Description static String
EXCEEDED_CHECKPOINT_TOLERABLE_FAILURE_MESSAGE
static int
UNLIMITED_TOLERABLE_FAILURE_NUMBER
-
Constructor Summary
Constructors Constructor Description CheckpointFailureManager(int tolerableCpFailureNumber, CheckpointFailureManager.FailJobCallback failureCallback)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
checkFailureCounter(CheckpointException exception, long checkpointId)
void
handleCheckpointException(PendingCheckpoint pendingCheckpoint, CheckpointProperties checkpointProperties, CheckpointException exception, ExecutionAttemptID executionAttemptID, JobID job, PendingCheckpointStats pendingCheckpointStats, CheckpointStatsTracker statsTracker)
Failures on JM: all checkpoints - go against failure counter.void
handleCheckpointSuccess(long checkpointId)
Handle checkpoint success.
-
-
-
Field Detail
-
UNLIMITED_TOLERABLE_FAILURE_NUMBER
public static final int UNLIMITED_TOLERABLE_FAILURE_NUMBER
- See Also:
- Constant Field Values
-
EXCEEDED_CHECKPOINT_TOLERABLE_FAILURE_MESSAGE
public static final String EXCEEDED_CHECKPOINT_TOLERABLE_FAILURE_MESSAGE
- See Also:
- Constant Field Values
-
-
Constructor Detail
-
CheckpointFailureManager
public CheckpointFailureManager(int tolerableCpFailureNumber, CheckpointFailureManager.FailJobCallback failureCallback)
-
-
Method Detail
-
handleCheckpointException
public void handleCheckpointException(@Nullable PendingCheckpoint pendingCheckpoint, CheckpointProperties checkpointProperties, CheckpointException exception, @Nullable ExecutionAttemptID executionAttemptID, JobID job, @Nullable PendingCheckpointStats pendingCheckpointStats, CheckpointStatsTracker statsTracker)
Failures on JM:- all checkpoints - go against failure counter.
- any savepoints - don’t do anything, manual action, the failover will not help anyway.
Failures on TM:
- all checkpoints - go against failure counter (failover might help and we want to notify users).
- sync savepoints - we must always fail, otherwise we risk deadlock when the job cancelation waiting for finishing savepoint which never happens.
- non sync savepoints - go against failure counter (failover might help solve the problem).
- Parameters:
pendingCheckpoint
- the failed checkpoint if it was initialized already.checkpointProperties
- the checkpoint properties in order to determinate which handle strategy can be used.exception
- the checkpoint exception.executionAttemptID
- the execution attempt id, as a safe guard.job
- the JobID.pendingCheckpointStats
- the pending checkpoint statistics.statsTracker
- the tracker for checkpoint statistics.
-
checkFailureCounter
public void checkFailureCounter(CheckpointException exception, long checkpointId)
-
handleCheckpointSuccess
public void handleCheckpointSuccess(long checkpointId)
Handle checkpoint success.- Parameters:
checkpointId
- the failed checkpoint id used to count the continuous failure number based on checkpoint id sequence.
-
-