@PublicEvolving public interface SupportsPartitioning
Partitions split the data stored in an external system into smaller portions that are
identified by one or more string-based partition keys. A single partition is represented as a
Map < String, String > which maps each partition key to a partition value. Partition keys
and their order are defined by the catalog table.
For example, data can be partitioned by region and within a region partitioned by month. The order of the partition keys (in the example: first by region then by month) is defined by the catalog table. A list of partitions could be:
List( ['region'='europe', 'month'='2020-01'], ['region'='europe', 'month'='2020-02'], ['region'='asia', 'month'='2020-01'], ['region'='asia', 'month'='2020-02'] )
Given the following partitioned table:
CREATE TABLE t (a INT, b STRING, c DOUBLE, region STRING, month STRING) PARTITION BY (region, month);
We can insert data into static table partitions using the
INSERT INTO ...
INSERT INTO t PARTITION (region='europe', month='2020-01') SELECT a, b, c FROM my_view;
If all partition keys get a value assigned in the
PARTITION clause, the operation is
considered an "insertion into a static partition". In the above example, the query result should
be written into the static partition
region='europe', month='2020-01' which will be
passed by the planner into
applyStaticPartition(Map). The planner is also able to
derived static partitions from literals of a query:
INSERT INTO t SELECT a, b, c, 'asia' AS region, '2020-01' AS month FROM my_view;
Alternatively, we can insert data into dynamic table partitions using the SQL syntax:
INSERT INTO t PARTITION (region='europe') SELECT a, b, c, month FROM another_view;
If only a subset of all partition keys get a static value assigned in the
clause or with a constant part in a
SELECT clause, the operation is considered an
"insertion into a dynamic partition". In the above example, the static partition part is
region='europe' which will be passed by the planner into
remaining values for partition keys should be obtained from each individual record by the sink
during runtime. In the two examples above, the
month field is the dynamic partition key.
PARTITION clause contains no static assignments or is omitted entirely, all
values for partition keys are either derived from static parts of the query or obtained
|Modifier and Type||Method and Description|
Provides the static part of a partition.
Returns whether data needs to be grouped by partition before it is consumed by the sink.
A single partition maps each partition key to a partition value. Depending on the user-defined statement, the partition might not include all partition keys.
See the documentation of
SupportsPartitioning for more information.
partition- user-defined (possibly partial) static partition
default boolean requiresPartitionGrouping(boolean supportsGrouping)
If this method returns true, the sink can expect that all records will be grouped by the partition keys before consumed by the sink. In other words: The sink will receive all elements of one partition and then all elements of another partition. Elements of different partitions will not be mixed. For some sinks, this can be used to reduce the number of partition writers and improve writing performance by writing one partition at a time.
The given argument indicates whether the current execution mode supports grouping or not. For example, depending on the execution mode a sorting operation might not be available during runtime.
supportsGrouping- whether the current execution mode supports grouping
supportsGroupingis false, it should never return true, otherwise the planner will fail.
Copyright © 2014–2022 The Apache Software Foundation. All rights reserved.