Ctrl+K
Logo image Logo image

Site Navigation

  • API Reference
  • Examples

Site Navigation

  • API Reference
  • Examples

Section Navigation

  • PyFlink Table
  • PyFlink DataStream
    • StreamExecutionEnvironment
    • DataStream
    • Functions
    • State
    • Timer
    • Window
    • Checkpoint
    • Side Outputs
    • Connectors
    • Formats
  • PyFlink Common

pyflink.datastream.state_backend.PredefinedOptions#

class PredefinedOptions(value)[source]#

The PredefinedOptions are configuration settings for the RocksDBStateBackend. The various pre-defined choices are configurations that have been empirically determined to be beneficial for performance under different settings.

Some of these settings are based on experiments by the Flink community, some follow guides from the RocksDB project. If some configurations should be enabled unconditionally, they are not included in any of the pre-defined options. See the documentation for RocksDBResourceContainer in the Java API for further details. Note that setUseFsync(false) is set by default irrespective of the PredefinedOptions setting. Because Flink does not rely on RocksDB data on disk for recovery, there is no need to sync data to stable storage.

DEFAULT:

Default options for all settings. No additional options are set.

SPINNING_DISK_OPTIMIZED:

Pre-defined options for regular spinning hard disks.

This constant configures RocksDB with some options that lead empirically to better performance when the machines executing the system use regular spinning hard disks.

The following options are set:

  • setCompactionStyle(CompactionStyle.LEVEL)

  • setLevelCompactionDynamicLevelBytes(true)

  • setMaxBackgroundJobs(4)

  • setMaxOpenFiles(-1)

SPINNING_DISK_OPTIMIZED_HIGH_MEM:

Pre-defined options for better performance on regular spinning hard disks, at the cost of a higher memory consumption.

Note

These settings will cause RocksDB to consume a lot of memory for block caching and compactions. If you experience out-of-memory problems related to, RocksDB, consider switching back to SPINNING_DISK_OPTIMIZED.

The following options are set:

  • BlockBasedTableConfig.setBlockCacheSize(256 MBytes)

  • BlockBasedTableConfig.setBlockSize(128 KBytes)

  • BlockBasedTableConfig.setFilterPolicy(BloomFilter(

    BLOOM_FILTER_BITS_PER_KEY, BLOOM_FILTER_BLOCK_BASED_MODE)

  • setLevelCompactionDynamicLevelBytes(true)

  • setMaxBackgroundJobs(4)

  • setMaxBytesForLevelBase(1 GByte)

  • setMaxOpenFiles(-1)

  • setMaxWriteBufferNumber(4)

  • setMinWriteBufferNumberToMerge(3)

  • setTargetFileSizeBase(256 MBytes)

  • setWriteBufferSize(64 MBytes)

The BLOOM_FILTER_BITS_PER_KEY and BLOOM_FILTER_BLOCK_BASED_MODE options are set via state.backend.rocksdb.bloom-filter.bits-per-key and state.backend.rocksdb.bloom-filter.block-based-mode, respectively.

FLASH_SSD_OPTIMIZED:

Pre-defined options for Flash SSDs.

This constant configures RocksDB with some options that lead empirically to better performance when the machines executing the system use SSDs.

The following options are set:

  • setMaxBackgroundJobs(4)

  • setMaxOpenFiles(-1)

Attributes

DEFAULT

SPINNING_DISK_OPTIMIZED

SPINNING_DISK_OPTIMIZED_HIGH_MEM

FLASH_SSD_OPTIMIZED

previous

pyflink.datastream.state_backend.CustomStateBackend

next

Timer

Show Source

Created using Sphinx 4.5.0.