@PublicEvolving public interface CheckpointStorage extends Serializable
StateBackend
's store their state for fault tolerance in
streaming applications. Various implementations store their checkpoints in different fashions and
have different requirements and availability guarantees.
For example, JobManagerCheckpointStorage
stores checkpoints in the memory of the JobManager. It is
lightweight and without additional dependencies but is not scalable and only supports small state
sizes. This checkpoint storage policy is convenient for local testing and development.
FileSystemCheckpointStorage
stores checkpoints in a filesystem. For systems like HDFS, NFS
Drives, S3, and GCS, this storage policy supports large state size, in the magnitude of many
terabytes while providing a highly available foundation for stateful applications. This
checkpoint storage policy is recommended for most production deployments.
The CheckpointStorage
creates services for raw bytes storage.
The raw bytes storage (through the CheckpointStreamFactory
) is the fundamental
service that simply stores bytes in a fault tolerant fashion. This service is used by the
JobManager to store checkpoint and recovery metadata and is typically also used by the keyed- and
operator state backends to store checkpointed state.
Implementations need to be serializable
, because they distributed
across parallel processes (for distributed execution) together with the streaming application
code.
Because of that, CheckpointStorage
implementations are meant to be like
factories that create the proper states stores that provide access to the persistent. That
way, the Checkpoint Storage can be very lightweight (contain only configurations) which makes it
easier to be serializable.
Checkpoint storage implementations have to be thread-safe. Multiple threads may be creating streams concurrently.
Modifier and Type | Method and Description |
---|---|
CheckpointStorageAccess |
createCheckpointStorage(JobID jobId)
Creates a storage for checkpoints for the given job.
|
CompletedCheckpointStorageLocation |
resolveCheckpoint(String externalPointer)
Resolves the given pointer to a checkpoint/savepoint into a checkpoint location.
|
CompletedCheckpointStorageLocation resolveCheckpoint(String externalPointer) throws IOException
externalPointer
- The external checkpoint pointer to resolve.IOException
- Thrown, if the state backend does not understand the pointer, or if the
pointer could not be resolved due to an I/O error.CheckpointStorageAccess createCheckpointStorage(JobID jobId) throws IOException
jobId
- The job to store checkpoint data for.IOException
- Thrown if the checkpoint storage cannot be initialized.Copyright © 2014–2024 The Apache Software Foundation. All rights reserved.