Lookup Join

Lookup Join #

A Lookup Join is used to enrich a table with data that is queried from Flink Table Store. The join requires one table to have a processing time attribute and the other table to be backed by a lookup source connector.

First, create a table, and update it in real-time.

-- Create a table store catalog
  'warehouse'='hdfs://nn:8020/warehouse/path' -- or 'file://tmp/foo/bar'

USE CATALOG my_catalog;

-- Create a table in table-store catalog
CREATE TABLE customers (
  name STRING,
  country STRING,
  zip STRING

-- Launch a streaming job to update customers table
INSERT INTO customers ...

Then, you can use this table in lookup join.

-- enrich each order with customer information
SELECT o.order_id, o.total, c.country, c.zip
FROM Orders AS o
         JOIN customers FOR SYSTEM_TIME AS OF o.proc_time AS c
              ON o.customer_id = c.id;

The lookup join node will maintain the rocksdb cache locally and pull the latest updates of the table in real time, and only pull the necessary data. Therefore, your filter conditions are also very important.

In order to avoid excessive use of local disks, the lookup join feature is only suitable for table sizes below tens of millions.

Note: Partitioned or non-pk tables are not supported now.

Project pushdown can effectively reduce the overhead, FLINK-29138 fixed the bug that the project cannot be pushed down to the source. So it is preferable to use a version greater than or equal to flink 1.14.6, flink 1.15.3. Or you can cherry-pick the commit to your own Flink branch.

RocksDBOptions #

Options for rocksdb cache, you can configure options in WITH or dynamic table hints.

SELECT o.order_id, o.total, c.country, c.zip
  FROM Orders AS o JOIN customers /*+ OPTIONS('lookup.cache-rows'='20000') */
  FOR SYSTEM_TIME AS OF o.proc_time AS c ON o.customer_id = c.id;
Key Default Type Description
10000 Long The maximum number of rows to store in the cache
4 kb MemorySize The approximate size (in bytes) of user data packed per block. The default blocksize is '4KB'.
8 mb MemorySize The amount of the cache for data blocks in RocksDB. The default block-cache size is '8MB'.
4 kb MemorySize Approximate size of partitioned metadata packed per block. Currently applied to indexes block when partitioned index/filters option is enabled. The default blocksize is '4KB'.
10.0 Double Bits per key that bloom filter will use, this only take effect when bloom filter is used. The default value is 10.0.
false Boolean If true, RocksDB will use block-based filter instead of full filter, this only take effect when bloom filter is used. The default value is 'false'.
256 mb MemorySize The upper-bound of the total size of level base files in bytes. The default value is '256MB'.
64 mb MemorySize The target file size for compaction, which determines a level-1 file size. The default value is '64MB'.
false Boolean If true, RocksDB will pick target size of each level dynamically. From an empty DB, RocksDB would make last level the base level, which means merging L0 data into the last level, until it exceeds max_bytes_for_level_base. And then repeat this process for second last level and so on. The default value is 'false'. For more information, please refer to RocksDB's doc.


The specified compaction style for DB. Candidate compaction style is LEVEL, FIFO, UNIVERSAL or NONE, and Flink chooses 'LEVEL' as default style.

Possible values:
  • "LEVEL"
  • "FIFO"
  • "NONE"


The compression type.

Possible values:
-1 Integer The maximum number of open files (per stateful operator) that can be used by the DB, '-1' means no limit. The default value is '-1'.
2 Integer The maximum number of concurrent background flush and compaction jobs (per stateful operator). The default value is '2'.
false Boolean If true, every newly created SST file will contain a Bloom filter. It is disabled by default.
2 Integer The maximum number of write buffers that are built up in memory. The default value is '2'.
1 Integer The minimum number of write buffers that will be merged together before writing to storage. The default value is '1'.
64 mb MemorySize The amount of data built up in memory (backed by an unsorted log on disk) before converting to a sorted on-disk files. The default writebuffer size is '64MB'.