@PublicEvolving public interface StateBackend extends Serializable
For example, the hashmap
state backend
keeps working state in the memory of the TaskManager. The backend is lightweight
and without additional dependencies.
The EmbeddedRocksDBStateBackend
stores working state in an embedded RocksDB and is able to scale working state to many terabytes in
size, only limited by available disk space across all task managers.
The StateBackend
creates services for for keyed state and operator
state.
The CheckpointableKeyedStateBackend
and OperatorStateBackend
created by this
state backend define how to hold the working state for keys and operators. They also define how
to checkpoint that state, frequently using the raw bytes storage (via the CheckpointStreamFactory
). However, it is also possible that for example a keyed state backend
simply implements the bridge to a key/value store, and that it does not need to store anything in
the raw byte storage upon a checkpoint.
State Backends need to be serializable
, because they distributed
across parallel processes (for distributed execution) together with the streaming application
code.
Because of that, StateBackend
implementations (typically subclasses of AbstractStateBackend
) are meant to be like factories that create the proper states stores
that provide access to the persistent storage and hold the keyed- and operator state data
structures. That way, the State Backend can be very lightweight (contain only configurations)
which makes it easier to be serializable.
State backend implementations have to be thread-safe. Multiple threads may be creating keyed-/operator state backends concurrently.
Modifier and Type | Method and Description |
---|---|
<K> CheckpointableKeyedStateBackend<K> |
createKeyedStateBackend(Environment env,
JobID jobID,
String operatorIdentifier,
TypeSerializer<K> keySerializer,
int numberOfKeyGroups,
KeyGroupRange keyGroupRange,
TaskKvStateRegistry kvStateRegistry,
TtlTimeProvider ttlTimeProvider,
MetricGroup metricGroup,
Collection<KeyedStateHandle> stateHandles,
CloseableRegistry cancelStreamRegistry)
Creates a new
CheckpointableKeyedStateBackend that is responsible for holding
keyed state and checkpointing it. |
default <K> CheckpointableKeyedStateBackend<K> |
createKeyedStateBackend(Environment env,
JobID jobID,
String operatorIdentifier,
TypeSerializer<K> keySerializer,
int numberOfKeyGroups,
KeyGroupRange keyGroupRange,
TaskKvStateRegistry kvStateRegistry,
TtlTimeProvider ttlTimeProvider,
MetricGroup metricGroup,
Collection<KeyedStateHandle> stateHandles,
CloseableRegistry cancelStreamRegistry,
double managedMemoryFraction)
Creates a new
CheckpointableKeyedStateBackend with the given managed memory fraction. |
OperatorStateBackend |
createOperatorStateBackend(Environment env,
String operatorIdentifier,
Collection<OperatorStateHandle> stateHandles,
CloseableRegistry cancelStreamRegistry)
Creates a new
OperatorStateBackend that can be used for storing operator state. |
default String |
getName()
Return the name of this backend, default is simple class name.
|
default boolean |
supportsNoClaimRestoreMode()
Tells if a state backend supports the
RestoreMode.NO_CLAIM mode. |
default boolean |
supportsSavepointFormat(SavepointFormatType formatType) |
default boolean |
useManagedMemory()
Whether the state backend uses Flink's managed memory.
|
default String getName()
DelegatingStateBackend
may return the simple class
name of the delegated backend.<K> CheckpointableKeyedStateBackend<K> createKeyedStateBackend(Environment env, JobID jobID, String operatorIdentifier, TypeSerializer<K> keySerializer, int numberOfKeyGroups, KeyGroupRange keyGroupRange, TaskKvStateRegistry kvStateRegistry, TtlTimeProvider ttlTimeProvider, MetricGroup metricGroup, @Nonnull Collection<KeyedStateHandle> stateHandles, CloseableRegistry cancelStreamRegistry) throws Exception
CheckpointableKeyedStateBackend
that is responsible for holding
keyed state and checkpointing it.
Keyed State is state where each value is bound to a key.
K
- The type of the keys by which the state is organized.env
- The environment of the task.jobID
- The ID of the job that the task belongs to.operatorIdentifier
- The identifier text of the operator.keySerializer
- The key-serializer for the operator.numberOfKeyGroups
- The number of key-groups aka max parallelism.keyGroupRange
- Range of key-groups for which the to-be-created backend is responsible.kvStateRegistry
- KvStateRegistry helper for this task.ttlTimeProvider
- Provider for TTL logic to judge about state expiration.metricGroup
- The parent metric group for all state backend metrics.stateHandles
- The state handles for restore.cancelStreamRegistry
- The registry to which created closeable objects will be
registered during restore.Exception
- This method may forward all exceptions that occur while instantiating the
backend.default <K> CheckpointableKeyedStateBackend<K> createKeyedStateBackend(Environment env, JobID jobID, String operatorIdentifier, TypeSerializer<K> keySerializer, int numberOfKeyGroups, KeyGroupRange keyGroupRange, TaskKvStateRegistry kvStateRegistry, TtlTimeProvider ttlTimeProvider, MetricGroup metricGroup, @Nonnull Collection<KeyedStateHandle> stateHandles, CloseableRegistry cancelStreamRegistry, double managedMemoryFraction) throws Exception
CheckpointableKeyedStateBackend
with the given managed memory fraction.
Backends that use managed memory are required to implement this interface.Exception
OperatorStateBackend createOperatorStateBackend(Environment env, String operatorIdentifier, @Nonnull Collection<OperatorStateHandle> stateHandles, CloseableRegistry cancelStreamRegistry) throws Exception
OperatorStateBackend
that can be used for storing operator state.
Operator state is state that is associated with parallel operator (or function) instances, rather than with keys.
env
- The runtime environment of the executing task.operatorIdentifier
- The identifier of the operator whose state should be stored.stateHandles
- The state handles for restore.cancelStreamRegistry
- The registry to register streams to close if task canceled.Exception
- This method may forward all exceptions that occur while instantiating the
backend.default boolean useManagedMemory()
default boolean supportsNoClaimRestoreMode()
RestoreMode.NO_CLAIM
mode.
If a state backend supports NO_CLAIM
mode, it should create an independent
snapshot when it receives CheckpointType.FULL_CHECKPOINT
in Snapshotable.snapshot(long, long, CheckpointStreamFactory, CheckpointOptions)
.
RestoreMode.NO_CLAIM
mode.default boolean supportsSavepointFormat(SavepointFormatType formatType)
Copyright © 2014–2024 The Apache Software Foundation. All rights reserved.