Ctrl+K
Logo image Logo image

Site Navigation

  • API Reference
  • Examples

Site Navigation

  • API Reference
  • Examples

Section Navigation

  • PyFlink Table
  • PyFlink DataStream
    • StreamExecutionEnvironment
    • DataStream
    • Functions
    • State
    • Timer
    • Window
    • Checkpoint
    • Side Outputs
    • Connectors
    • Formats
  • PyFlink Common

pyflink.datastream.checkpoint_storage.FileSystemCheckpointStorage#

class FileSystemCheckpointStorage(checkpoint_path=None, file_state_size_threshold=None, write_buffer_size=- 1, j_filesystem_checkpoint_storage=None)[source]#

FileSystemCheckpointStorage checkpoints state as files to a filesystem.

Each checkpoint will store all its files in a subdirectory that includes the checkpoints number, such as hdfs://namenode:port/flink-checkpoints/chk-17/.

State Size Considerations

This checkpoint storage stores small state chunks directly with the metadata, to avoid creating many small files. The threshold for that is configurable. When increasing this threshold, the size of the checkpoint metadata increases. The checkpoint metadata of all retained completed checkpoints needs to fit into the JobManager’s heap memory. This is typically not a problem, unless the threashold get_min_file_size_threshold is increased significantly.

Persistence Guarantees

Checkpoints from this checkpoint storage are as persistent and available as the filesystem that it is written to. If the file system is a persistent distributed file system, this checkpoint storage supports highly available setups. The backend additionally supports savepoints and externalized checkpoints.

Configuration

As for all checkpoint storage policies, this backend can either be configured within the application (by creating the storage with the respective constructor parameters and setting it on the execution environment) or by specifying it in the Flink configuration.

If the checkpoint storage was specified in the application, it may pick up additional configuration parameters from the Flink configuration. For example, if the storage is configured in the application without a default savepoint directory, it will pick up a default savepoint directory specified in the Flink configuration of the running job/cluster.

Methods

get_checkpoint_path()

Gets the base directory where all the checkpoints are stored.

get_min_file_size_threshold()

Gets the threshold below which state is stored as part of the metadata, rather than in file.

get_savepoint_path()

Gets the base directory where all the savepoints are stored.

get_write_buffer_size()

Gets the write buffer size for created checkpoint streams.

Attributes

MAX_FILE_STATE_THRESHOLD

previous

pyflink.datastream.checkpoint_storage.JobManagerCheckpointStorage

next

pyflink.datastream.checkpoint_storage.CustomCheckpointStorage

Show Source

Created using Sphinx 4.5.0.