@PublicEvolving public interface SupportsPartitioning
DynamicTableSink
.
Partitions split the data stored in an external system into smaller portions that are
identified by one or more string-based partition keys. A single partition is represented as a
Map < String, String >
which maps each partition key to a partition value. Partition keys
and their order are defined by the catalog table.
For example, data can be partitioned by region and within a region partitioned by month. The order of the partition keys (in the example: first by region then by month) is defined by the catalog table. A list of partitions could be:
List( ['region'='europe', 'month'='2020-01'], ['region'='europe', 'month'='2020-02'], ['region'='asia', 'month'='2020-01'], ['region'='asia', 'month'='2020-02'] )
Given the following partitioned table:
CREATE TABLE t (a INT, b STRING, c DOUBLE, region STRING, month STRING) PARTITION BY (region, month);
We can insert data into static table partitions using the INSERT INTO ...
PARTITION
syntax:
INSERT INTO t PARTITION (region='europe', month='2020-01') SELECT a, b, c FROM my_view;
If all partition keys get a value assigned in the PARTITION
clause, the operation is
considered an "insertion into a static partition". In the above example, the query result should
be written into the static partition region='europe', month='2020-01'
which will be
passed by the planner into applyStaticPartition(Map)
. The planner is also able to
derived static partitions from literals of a query:
INSERT INTO t SELECT a, b, c, 'asia' AS region, '2020-01' AS month FROM my_view;
Alternatively, we can insert data into dynamic table partitions using the SQL syntax:
INSERT INTO t PARTITION (region='europe') SELECT a, b, c, month FROM another_view;
If only a subset of all partition keys get a static value assigned in the PARTITION
clause or with a constant part in a SELECT
clause, the operation is considered an
"insertion into a dynamic partition". In the above example, the static partition part is region='europe'
which will be passed by the planner into applyStaticPartition(Map)
. The
remaining values for partition keys should be obtained from each individual record by the sink
during runtime. In the two examples above, the month
field is the dynamic partition key.
If the PARTITION
clause contains no static assignments or is omitted entirely, all
values for partition keys are either derived from static parts of the query or obtained
dynamically.
Modifier and Type | Method and Description |
---|---|
void |
applyStaticPartition(Map<String,String> partition)
Provides the static part of a partition.
|
default boolean |
requiresPartitionGrouping(boolean supportsGrouping)
Returns whether data needs to be grouped by partition before it is consumed by the sink.
|
void applyStaticPartition(Map<String,String> partition)
A single partition maps each partition key to a partition value. Depending on the user-defined statement, the partition might not include all partition keys.
See the documentation of SupportsPartitioning
for more information.
partition
- user-defined (possibly partial) static partitiondefault boolean requiresPartitionGrouping(boolean supportsGrouping)
If this method returns true, the sink can expect that all records will be grouped by the partition keys before consumed by the sink. In other words: The sink will receive all elements of one partition and then all elements of another partition. Elements of different partitions will not be mixed. For some sinks, this can be used to reduce the number of partition writers and improve writing performance by writing one partition at a time.
The given argument indicates whether the current execution mode supports grouping or not. For example, depending on the execution mode a sorting operation might not be available during runtime.
supportsGrouping
- whether the current execution mode supports groupingsupportsGrouping
is false, it should never return true, otherwise the planner will fail.Copyright © 2014–2024 The Apache Software Foundation. All rights reserved.