.. ################################################################################ Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. ################################################################################ ===== State ===== State ----- .. currentmodule:: pyflink.datastream.state .. autosummary:: :toctree: api/ ValueState AppendingState MergingState ReducingState AggregatingState ListState MapState StateDescriptor --------------- .. currentmodule:: pyflink.datastream.state .. autosummary:: :toctree: api/ ValueStateDescriptor ListStateDescriptor MapStateDescriptor ReducingStateDescriptor AggregatingStateDescriptor StateTtlConfig -------------- .. currentmodule:: pyflink.datastream.state .. autoclass:: pyflink.datastream.state::StateTtlConfig.UpdateType :members: .. autoclass:: pyflink.datastream.state::StateTtlConfig.StateVisibility :members: .. autoclass:: pyflink.datastream.state::StateTtlConfig.TtlTimeCharacteristic :members: .. autoclass:: pyflink.datastream.state::StateTtlConfig.CleanupStrategies :members: .. currentmodule:: pyflink.datastream.state .. autosummary:: :toctree: api/ StateTtlConfig.new_builder StateTtlConfig.get_update_type StateTtlConfig.get_state_visibility StateTtlConfig.get_ttl StateTtlConfig.get_ttl_time_characteristic StateTtlConfig.is_enabled StateTtlConfig.get_cleanup_strategies StateTtlConfig.Builder.set_update_type StateTtlConfig.Builder.update_ttl_on_create_and_write StateTtlConfig.Builder.update_ttl_on_read_and_write StateTtlConfig.Builder.set_state_visibility StateTtlConfig.Builder.return_expired_if_not_cleaned_up StateTtlConfig.Builder.never_return_expired StateTtlConfig.Builder.set_ttl_time_characteristic StateTtlConfig.Builder.use_processing_time StateTtlConfig.Builder.cleanup_full_snapshot StateTtlConfig.Builder.cleanup_incrementally StateTtlConfig.Builder.cleanup_in_rocksdb_compact_filter StateTtlConfig.Builder.disable_cleanup_in_background StateTtlConfig.Builder.set_ttl StateTtlConfig.Builder.build StateBackend ------------ A **State Backend** defines how the state of a streaming application is stored locally within the cluster. Different state backends store their state in different fashions, and use different data structures to hold the state of running applications. For example, the :class:`HashMapStateBackend` keeps working state in the memory of the TaskManager. The backend is lightweight and without additional dependencies. The :class:`EmbeddedRocksDBStateBackend` keeps working state in the memory of the TaskManager and stores state checkpoints in a filesystem(typically a replicated highly-available filesystem, like `HDFS `_, `Ceph `_, `S3 `_, `GCS `_, etc). The :class:`EmbeddedRocksDBStateBackend` stores working state in an embedded `RocksDB `_, instance and is able to scale working state to many terrabytes in size, only limited by available disk space across all task managers. **Raw Bytes Storage and Backends** The :class:`StateBackend` creates services for *raw bytes storage* and for *keyed state* and *operator state*. The `org.apache.flink.runtime.state.AbstractKeyedStateBackend and `org.apache.flink.runtime.state.OperatorStateBackend` created by this state backend define how to hold the working state for keys and operators. They also define how to checkpoint that state, frequently using the raw bytes storage (via the `org.apache.flink.runtime.state.CheckpointStreamFactory`). However, it is also possible that for example a keyed state backend simply implements the bridge to a key/value store, and that it does not need to store anything in the raw byte storage upon a checkpoint. **Serializability** State Backends need to be serializable(`java.io.Serializable`), because they distributed across parallel processes (for distributed execution) together with the streaming application code. Because of that, :class:`StateBackend` implementations are meant to be like *factories* that create the proper states stores that provide access to the persistent storage and hold the keyed- and operator state data structures. That way, the State Backend can be very lightweight (contain only configurations) which makes it easier to be serializable. **Thread Safety** State backend implementations have to be thread-safe. Multiple threads may be creating streams and keyed-/operator state backends concurrently. .. currentmodule:: pyflink.datastream.state_backend .. autosummary:: :toctree: api/ HashMapStateBackend EmbeddedRocksDBStateBackend MemoryStateBackend FsStateBackend RocksDBStateBackend CustomStateBackend PredefinedOptions