Interface SupportsPartitioning
-
- All Known Implementing Classes:
FileSystemTableSink
@PublicEvolving public interface SupportsPartitioning
Enables to write partitioned data in aDynamicTableSink
.Partitions split the data stored in an external system into smaller portions that are identified by one or more string-based partition keys. A single partition is represented as a
Map < String, String >
which maps each partition key to a partition value. Partition keys and their order are defined by the catalog table.For example, data can be partitioned by region and within a region partitioned by month. The order of the partition keys (in the example: first by region then by month) is defined by the catalog table. A list of partitions could be:
List( ['region'='europe', 'month'='2020-01'], ['region'='europe', 'month'='2020-02'], ['region'='asia', 'month'='2020-01'], ['region'='asia', 'month'='2020-02'] )
Given the following partitioned table:
CREATE TABLE t (a INT, b STRING, c DOUBLE, region STRING, month STRING) PARTITION BY (region, month);
We can insert data into static table partitions using the
INSERT INTO ... PARTITION
syntax:INSERT INTO t PARTITION (region='europe', month='2020-01') SELECT a, b, c FROM my_view;
If all partition keys get a value assigned in the
PARTITION
clause, the operation is considered an "insertion into a static partition". In the above example, the query result should be written into the static partitionregion='europe', month='2020-01'
which will be passed by the planner intoapplyStaticPartition(Map)
. The planner is also able to derived static partitions from literals of a query:INSERT INTO t SELECT a, b, c, 'asia' AS region, '2020-01' AS month FROM my_view;
Alternatively, we can insert data into dynamic table partitions using the SQL syntax:
INSERT INTO t PARTITION (region='europe') SELECT a, b, c, month FROM another_view;
If only a subset of all partition keys get a static value assigned in the
PARTITION
clause or with a constant part in aSELECT
clause, the operation is considered an "insertion into a dynamic partition". In the above example, the static partition part isregion='europe'
which will be passed by the planner intoapplyStaticPartition(Map)
. The remaining values for partition keys should be obtained from each individual record by the sink during runtime. In the two examples above, themonth
field is the dynamic partition key.If the
PARTITION
clause contains no static assignments or is omitted entirely, all values for partition keys are either derived from static parts of the query or obtained dynamically.A sink can implement both
SupportsPartitioning
andSupportsBucketing
. Conceptually, a partition can be seen as kind of "directory" whereas buckets correspond to "files" per directory. Partitioning splits the data on a small, human-readable keyspace (e.g. by year or by geographical region). This enables efficient selection via equality, inequality, or ranges due to knowledge about existing partitions. Bucketing operates within partitions on a potentially large and infinite keyspace.- See Also:
SupportsBucketing
-
-
Method Summary
All Methods Instance Methods Abstract Methods Default Methods Modifier and Type Method Description void
applyStaticPartition(Map<String,String> partition)
Provides the static part of a partition.default boolean
requiresPartitionGrouping(boolean supportsGrouping)
Returns whether data needs to be grouped by partition before it is consumed by the sink.
-
-
-
Method Detail
-
applyStaticPartition
void applyStaticPartition(Map<String,String> partition)
Provides the static part of a partition.A single partition maps each partition key to a partition value. Depending on the user-defined statement, the partition might not include all partition keys.
See the documentation of
SupportsPartitioning
for more information.- Parameters:
partition
- user-defined (possibly partial) static partition
-
requiresPartitionGrouping
default boolean requiresPartitionGrouping(boolean supportsGrouping)
Returns whether data needs to be grouped by partition before it is consumed by the sink. By default, this is not required from the runtime and records arrive in arbitrary partition order.If this method returns true, the sink can expect that all records will be grouped by the partition keys before consumed by the sink. In other words: The sink will receive all elements of one partition and then all elements of another partition. Elements of different partitions will not be mixed. For some sinks, this can be used to reduce the number of partition writers and improve writing performance by writing one partition at a time.
The given argument indicates whether the current execution mode supports grouping or not. For example, depending on the execution mode a sorting operation might not be available during runtime.
- Parameters:
supportsGrouping
- whether the current execution mode supports grouping- Returns:
- whether data need to be grouped by partition before consumed by the sink. If
supportsGrouping
is false, it should never return true, otherwise the planner will fail.
-
-