Interface CheckpointStorage
-
- All Superinterfaces:
Serializable
- All Known Subinterfaces:
ConfigurableCheckpointStorage
- All Known Implementing Classes:
AbstractFileStateBackend
,BatchExecutionCheckpointStorage
,FileSystemCheckpointStorage
,JobManagerCheckpointStorage
@PublicEvolving public interface CheckpointStorage extends Serializable
CheckpointStorage defines howStateBackend
's store their state for fault tolerance in streaming applications. Various implementations store their checkpoints in different fashions and have different requirements and availability guarantees.For example,
JobManagerCheckpointStorage
stores checkpoints in the memory of the JobManager. It is lightweight and without additional dependencies but is not scalable and only supports small state sizes. This checkpoint storage policy is convenient for local testing and development.FileSystemCheckpointStorage
stores checkpoints in a filesystem. For systems like HDFS, NFS Drives, S3, and GCS, this storage policy supports large state size, in the magnitude of many terabytes while providing a highly available foundation for stateful applications. This checkpoint storage policy is recommended for most production deployments.Raw Bytes Storage
The
CheckpointStorage
creates services for raw bytes storage.The raw bytes storage (through the
CheckpointStreamFactory
) is the fundamental service that simply stores bytes in a fault tolerant fashion. This service is used by the JobManager to store checkpoint and recovery metadata and is typically also used by the keyed- and operator state backends to store checkpointed state.Serializability
Implementations need to be
serializable
, because they distributed across parallel processes (for distributed execution) together with the streaming application code.Because of that,
CheckpointStorage
implementations are meant to be like factories that create the proper states stores that provide access to the persistent. That way, the Checkpoint Storage can be very lightweight (contain only configurations) which makes it easier to be serializable.Thread Safety
Checkpoint storage implementations have to be thread-safe. Multiple threads may be creating streams concurrently.
-
-
Method Summary
All Methods Instance Methods Abstract Methods Modifier and Type Method Description CheckpointStorageAccess
createCheckpointStorage(JobID jobId)
Creates a storage for checkpoints for the given job.CompletedCheckpointStorageLocation
resolveCheckpoint(String externalPointer)
Resolves the given pointer to a checkpoint/savepoint into a checkpoint location.
-
-
-
Method Detail
-
resolveCheckpoint
CompletedCheckpointStorageLocation resolveCheckpoint(String externalPointer) throws IOException
Resolves the given pointer to a checkpoint/savepoint into a checkpoint location. The location supports reading the checkpoint metadata, or disposing the checkpoint storage location.- Parameters:
externalPointer
- The external checkpoint pointer to resolve.- Returns:
- The checkpoint location handle.
- Throws:
IOException
- Thrown, if the state backend does not understand the pointer, or if the pointer could not be resolved due to an I/O error.
-
createCheckpointStorage
CheckpointStorageAccess createCheckpointStorage(JobID jobId) throws IOException
Creates a storage for checkpoints for the given job. The checkpoint storage is used to write checkpoint data and metadata.- Parameters:
jobId
- The job to store checkpoint data for.- Returns:
- A checkpoint storage for the given job.
- Throws:
IOException
- Thrown if the checkpoint storage cannot be initialized.
-
-