@PublicEvolving public interface StateBackend extends Serializable
For example, the memory state backend
keeps working state in the memory of the TaskManager and stores checkpoints in the memory of the
JobManager. The backend is lightweight and without additional dependencies, but not highly available
and supports only small state.
The file system state backend
keeps working state in the memory of the TaskManager and stores state checkpoints in a filesystem
(typically a replicated highly-available filesystem, like HDFS,
Ceph, S3,
GCS, etc).
The RocksDBStateBackend
stores working state in RocksDB,
and checkpoints the state by default to a filesystem (similar to the FsStateBackend
).
StateBackend
creates services for raw bytes storage and for keyed state
and operator state.
The raw bytes storage (through the CheckpointStreamFactory
) is the fundamental
service that simply stores bytes in a fault tolerant fashion. This service is used by the JobManager
to store checkpoint and recovery metadata and is typically also used by the keyed- and operator state
backends to store checkpointed state.
The AbstractKeyedStateBackend
and OperatorStateBackend
created by this state
backend define how to hold the working state for keys and operators. They also define how to checkpoint
that state, frequently using the raw bytes storage (via the CheckpointStreamFactory
).
However, it is also possible that for example a keyed state backend simply implements the bridge to
a key/value store, and that it does not need to store anything in the raw byte storage upon a
checkpoint.
serializable
, because they distributed
across parallel processes (for distributed execution) together with the streaming application code.
Because of that, StateBackend
implementations (typically subclasses
of AbstractStateBackend
) are meant to be like factories that create the proper
states stores that provide access to the persistent storage and hold the keyed- and operator
state data structures. That way, the State Backend can be very lightweight (contain only
configurations) which makes it easier to be serializable.
Modifier and Type | Method and Description |
---|---|
CheckpointStorage |
createCheckpointStorage(JobID jobId)
Creates a storage for checkpoints for the given job.
|
<K> AbstractKeyedStateBackend<K> |
createKeyedStateBackend(Environment env,
JobID jobID,
String operatorIdentifier,
TypeSerializer<K> keySerializer,
int numberOfKeyGroups,
KeyGroupRange keyGroupRange,
TaskKvStateRegistry kvStateRegistry,
TtlTimeProvider ttlTimeProvider,
MetricGroup metricGroup,
Collection<KeyedStateHandle> stateHandles,
CloseableRegistry cancelStreamRegistry)
Creates a new
AbstractKeyedStateBackend that is responsible for holding keyed state
and checkpointing it. |
OperatorStateBackend |
createOperatorStateBackend(Environment env,
String operatorIdentifier,
Collection<OperatorStateHandle> stateHandles,
CloseableRegistry cancelStreamRegistry)
Creates a new
OperatorStateBackend that can be used for storing operator state. |
CompletedCheckpointStorageLocation |
resolveCheckpoint(String externalPointer)
Resolves the given pointer to a checkpoint/savepoint into a checkpoint location.
|
CompletedCheckpointStorageLocation resolveCheckpoint(String externalPointer) throws IOException
If the state backend cannot understand the format of the pointer (for example because it
was created by a different state backend) this method should throw an IOException
.
externalPointer
- The external checkpoint pointer to resolve.IOException
- Thrown, if the state backend does not understand the pointer, or if
the pointer could not be resolved due to an I/O error.CheckpointStorage createCheckpointStorage(JobID jobId) throws IOException
jobId
- The job to store checkpoint data for.IOException
- Thrown if the checkpoint storage cannot be initialized.<K> AbstractKeyedStateBackend<K> createKeyedStateBackend(Environment env, JobID jobID, String operatorIdentifier, TypeSerializer<K> keySerializer, int numberOfKeyGroups, KeyGroupRange keyGroupRange, TaskKvStateRegistry kvStateRegistry, TtlTimeProvider ttlTimeProvider, MetricGroup metricGroup, @Nonnull Collection<KeyedStateHandle> stateHandles, CloseableRegistry cancelStreamRegistry) throws Exception
AbstractKeyedStateBackend
that is responsible for holding keyed state
and checkpointing it.
Keyed State is state where each value is bound to a key.
K
- The type of the keys by which the state is organized.env
- The environment of the task.jobID
- The ID of the job that the task belongs to.operatorIdentifier
- The identifier text of the operator.keySerializer
- The key-serializer for the operator.numberOfKeyGroups
- The number of key-groups aka max parallelism.keyGroupRange
- Range of key-groups for which the to-be-created backend is responsible.kvStateRegistry
- KvStateRegistry helper for this task.ttlTimeProvider
- Provider for TTL logic to judge about state expiration.metricGroup
- The parent metric group for all state backend metrics.stateHandles
- The state handles for restore.cancelStreamRegistry
- The registry to which created closeable objects will be registered during restore.Exception
- This method may forward all exceptions that occur while instantiating the backend.OperatorStateBackend createOperatorStateBackend(Environment env, String operatorIdentifier, @Nonnull Collection<OperatorStateHandle> stateHandles, CloseableRegistry cancelStreamRegistry) throws Exception
OperatorStateBackend
that can be used for storing operator state.
Operator state is state that is associated with parallel operator (or function) instances, rather than with keys.
env
- The runtime environment of the executing task.operatorIdentifier
- The identifier of the operator whose state should be stored.stateHandles
- The state handles for restore.cancelStreamRegistry
- The registry to register streams to close if task canceled.Exception
- This method may forward all exceptions that occur while instantiating the backend.Copyright © 2014–2020 The Apache Software Foundation. All rights reserved.