pyflink.datastream.checkpoint_storage.JobManagerCheckpointStorage#
- class JobManagerCheckpointStorage(checkpoint_path=None, max_state_size=None, j_jobmanager_checkpoint_storage=None)[source]#
The CheckpointStorage checkpoints state directly to the JobManager’s memory (hence the name), but savepoints will be persisted to a file system.
This checkpoint storage is primarily for experimentation, quick local setups, or for streaming applications that have very small state: Because it requires checkpoints to go through the JobManager’s memory, larger state will occupy larger portions of the JobManager’s main memory, reducing operational stability. For any other setup, the FileSystemCheckpointStorage should be used. The FileSystemCheckpointStorage but checkpoints state directly to files rather than to the JobManager’s memory, thus supporting larger state sizes and more highly available recovery.
State Size Considerations
State checkpointing with this checkpoint storage is subject to the following conditions:
Each individual state must not exceed the configured maximum state size (see
get_max_state_size()
.All state from one task (i.e., the sum of all operator states and keyed states from all chained operators of the task) must not exceed what the RPC system supports, which is be default < 10 MB. That limit can be configured up, but that is typically not advised.
The sum of all states in the application times all retained checkpoints must comfortably fit into the JobManager’s JVM heap space.
Persistence Guarantees
For the use cases where the state sizes can be handled by this storage, it does guarantee persistence for savepoints, externalized checkpoints (of configured), and checkpoints (when high-availability is configured).
Configuration
As for all checkpoint storage, this type can either be configured within the application (by creating the storage with the respective constructor parameters and setting it on the execution environment) or by specifying it in the Flink configuration.
If the storage was specified in the application, it may pick up additional configuration parameters from the Flink configuration. For example, if the backend if configured in the application without a default savepoint directory, it will pick up a default savepoint directory specified in the Flink configuration of the running job/cluster. That behavior is implemented via the
configure()
method.Methods
get_checkpoint_path
()Gets the base directory where all the checkpoints are stored.
get_max_state_size
()Gets the maximum size that an individual state can have, as configured in the constructor.
get_savepoint_path
()Gets the base directory where all the savepoints are stored.
Attributes
DEFAULT_MAX_STATE_SIZE