Ctrl+K
Logo image Logo image

Site Navigation

  • API Reference
  • Examples

Site Navigation

  • API Reference
  • Examples

Section Navigation

  • PyFlink Table
  • PyFlink DataStream
    • StreamExecutionEnvironment
    • DataStream
    • Functions
    • State
    • Timer
    • Window
    • Checkpoint
    • Side Outputs
    • Connectors
    • Formats
  • PyFlink Common

pyflink.datastream.stream_execution_environment.StreamExecutionEnvironment.set_state_backend#

StreamExecutionEnvironment.set_state_backend(state_backend: pyflink.datastream.state_backend.StateBackend) → pyflink.datastream.stream_execution_environment.StreamExecutionEnvironment[source]#

Sets the state backend that describes how to store and checkpoint operator state. It defines both which data structures hold state during execution (for example hash tables, RockDB, or other data stores) as well as where checkpointed data will be persisted.

The MemoryStateBackend for example maintains the state in heap memory, as objects. It is lightweight without extra dependencies, but can checkpoint only small states(some counters).

In contrast, the FsStateBackend stores checkpoints of the state (also maintained as heap objects) in files. When using a replicated file system (like HDFS, S3, Alluxio, etc) this will guarantee that state is not lost upon failures of individual nodes and that streaming program can be executed highly available and strongly consistent(assuming that Flink is run in high-availability mode).

The build-in state backend includes:

MemoryStateBackend, FsStateBackend and RocksDBStateBackend.

See also

get_state_backend()

Example:

>>> env.set_state_backend(EmbeddedRocksDBStateBackend())
Parameters

state_backend – The StateBackend.

Returns

This object.

previous

pyflink.datastream.stream_execution_environment.StreamExecutionEnvironment.get_state_backend

next

pyflink.datastream.stream_execution_environment.StreamExecutionEnvironment.enable_changelog_state_backend

Show Source

Created using Sphinx 4.5.0.