State#
OperatorStateStore#
Fetches the |
State#
|
|
Base interface for partitioned state that supports adding elements and inspecting the current state. |
|
Extension of AppendingState that allows merging of state. |
|
|
|
|
|
|
|
|
|
A read-only view of the |
|
A type of state that can be created to store the state of a |
StateDescriptor#
|
StateDescriptor for ValueState. |
|
StateDescriptor for ListState. |
|
StateDescriptor for MapState. |
|
StateDescriptor for ReducingState. |
|
A StateDescriptor for AggregatingState. |
StateTtlConfig#
- class StateTtlConfig.UpdateType(value)[source]#
This option value configures when to update last access timestamp which prolongs state TTL.
- Disabled = 0#
TTL is disabled. State does not expire.
- OnCreateAndWrite = 1#
Last access timestamp is initialised when state is created and updated on every write operation.
- OnReadAndWrite = 2#
The same as OnCreateAndWrite but also updated on read.
- class StateTtlConfig.StateVisibility(value)[source]#
This option configures whether expired user value can be returned or not.
- NeverReturnExpired = 1#
Never return expired user value.
- ReturnExpiredIfNotCleanedUp = 0#
Return expired user value if it is not cleaned up yet.
- class StateTtlConfig.TtlTimeCharacteristic(value)[source]#
This option configures time scale to use for ttl.
- ProcessingTime = 0#
Processing time
- class StateTtlConfig.CleanupStrategies(strategies: Dict[pyflink.datastream.state.StateTtlConfig.CleanupStrategies.Strategies, pyflink.datastream.state.StateTtlConfig.CleanupStrategies.CleanupStrategy], is_cleanup_in_background: bool)[source]#
TTL cleanup strategies.
This class configures when to cleanup expired state with TTL. By default, state is always cleaned up on explicit read access if found expired. Currently cleanup of state full snapshot can be additionally activated.
- class IncrementalCleanupStrategy(cleanup_size: int, run_cleanup_for_every_record: bool)[source]#
Configuration of cleanup strategy while taking the full snapshot.
Sets the ttl update type. |
|
Sets the state visibility. |
|
Sets the time characteristic. |
|
Cleanup expired state in full snapshot on checkpoint. |
|
Cleanup expired state incrementally cleanup local state. |
|
|
Cleanup expired state while Rocksdb compaction is running. |
Disable default cleanup of expired state in background (enabled by default). |
|
Sets the ttl time. |
|
StateBackend#
A State Backend defines how the state of a streaming application is stored locally within the cluster. Different state backends store their state in different fashions, and use different data structures to hold the state of running applications.
For example, the HashMapStateBackend
keeps working state in the memory of the
TaskManager. The backend is lightweight and without additional dependencies.
The EmbeddedRocksDBStateBackend
keeps working state in the memory of the TaskManager
and stores state checkpoints in a filesystem(typically a replicated highly-available filesystem,
like HDFS, Ceph,
S3, GCS,
etc).
The EmbeddedRocksDBStateBackend
stores working state in an embedded
RocksDB, instance and is able to scale working state to many
terrabytes in size, only limited by available disk space across all task managers.
Raw Bytes Storage and Backends
The StateBackend
creates services for raw bytes storage and for keyed state
and operator state.
The org.apache.flink.runtime.state.AbstractKeyedStateBackend and `org.apache.flink.runtime.state.OperatorStateBackend created by this state backend define how to hold the working state for keys and operators. They also define how to checkpoint that state, frequently using the raw bytes storage (via the org.apache.flink.runtime.state.CheckpointStreamFactory). However, it is also possible that for example a keyed state backend simply implements the bridge to a key/value store, and that it does not need to store anything in the raw byte storage upon a checkpoint.
Serializability
State Backends need to be serializable(java.io.Serializable), because they distributed across parallel processes (for distributed execution) together with the streaming application code.
Because of that, StateBackend
implementations are meant to be like factories that
create the proper states stores that provide access to the persistent storage and hold the
keyed- and operator state data structures. That way, the State Backend can be very lightweight
(contain only configurations) which makes it easier to be serializable.
Thread Safety
State backend implementations have to be thread-safe. Multiple threads may be creating streams and keyed-/operator state backends concurrently.
|
This state backend holds the working state in the memory (JVM heap) of the TaskManagers and checkpoints based on the configured CheckpointStorage. |
|
A State Backend that stores its state in an embedded |
|
IMPORTANT MemoryStateBackend is deprecated in favor of HashMapStateBackend and JobManagerCheckpointStorage. |
|
IMPORTANT FsStateBackend is deprecated in favor of `HashMapStateBackend and FileSystemCheckpointStorage. |
|
IMPORTANT RocksDBStateBackend is deprecated in favor of EmbeddedRocksDBStateBackend and FileSystemCheckpointStorage. |
|
A wrapper of customized java state backend. |
|
The |