This documentation is for an unreleased version of Apache Flink Table Store. We recommend you use the latest stable version.
Basic Concepts #
Snapshot #
A snapshot captures the state of a table at some point in time. Users can access the latest data of a table through the latest snapshot. By time traveling, users can also access the previous state of a table through an earlier snapshot.
Partition #
Table Store adopts the same partitioning concept as Apache Hive to separate data.
Partitioning is an optional way of dividing a table into related parts based on the values of particular columns like date, city, and department. Each table can have one or more partition keys to identify a particular partition.
By partitioning, users can efficiently operate on a slice of records in the table. See file layouts for how files are divided into multiple partitions.
Partition keys must be a subset of primary keys if primary keys are defined.
Bucket #
Unpartitioned tables, or partitions in partitioned tables, are sub-divided into buckets, to provide extra structure to the data that may be used for more efficient querying.
The range for a bucket is determined by the hash value of one or more columns in the records. Users can specify bucketing columns by providing the bucket-key
option. If no bucket-key
option is specified, the primary key (if defined) or the complete record will be used as the bucket key.
A bucket is the smallest storage unit for reads and writes, so the number of buckets limits the maximum processing parallelism. This number should not be too big, though, as it will result in lots of small files and low read performance. In general, the recommended data size in each bucket is about 1GB.
See file layouts for how files are divided into buckets. Also, see rescale bucket if you want to adjust the number of buckets after a table is created.
Consistency Guarantees #
Table Store writers uses two-phase commit protocol to atomically commit a batch of records to the table. Each commit produces at most two snapshots at commit time.
For any two writers modifying a table at the same time, as long as they do not modify the same bucket, their commits are serializable. If they modify the same bucket, only snapshot isolation is guaranteed. That is, the final table state may be a mix of the two commits, but no changes are lost.