pyflink.datastream.stream_execution_environment.StreamExecutionEnvironment.set_state_backend#
- StreamExecutionEnvironment.set_state_backend(state_backend: pyflink.datastream.state_backend.StateBackend) pyflink.datastream.stream_execution_environment.StreamExecutionEnvironment [source]#
Sets the state backend that describes how to store and checkpoint operator state. It defines both which data structures hold state during execution (for example hash tables, RockDB, or other data stores) as well as where checkpointed data will be persisted.
The
MemoryStateBackend
for example maintains the state in heap memory, as objects. It is lightweight without extra dependencies, but can checkpoint only small states(some counters).In contrast, the
FsStateBackend
stores checkpoints of the state (also maintained as heap objects) in files. When using a replicated file system (like HDFS, S3, Alluxio, etc) this will guarantee that state is not lost upon failures of individual nodes and that streaming program can be executed highly available and strongly consistent(assuming that Flink is run in high-availability mode).- The build-in state backend includes:
MemoryStateBackend
,FsStateBackend
andRocksDBStateBackend
.
See also
Example:
>>> env.set_state_backend(EmbeddedRocksDBStateBackend())
- Parameters
state_backend – The
StateBackend
.- Returns
This object.