@PublicEvolving public interface SupportsDynamicFiltering
ScanTableSource, the table source can filter the partitions even the input data in runtime to reduce scan I/O.
Given the following SQL:
SELECT * FROM partitioned_fact_table t1, non_partitioned_dim_table t2 WHERE t1.part_key = t2.col1 AND t2.col2 = 100
In the example above, `partitioned_fact_table` is partition table whose partition key is
`part_key`, and `non_partitioned_dim_table` is a non-partition table which data contains all
partition values of `partitioned_fact_table`. With the filter
t2.col2 = 100, only a small
part of the partitions need to be scanned out to do the join operation. The specific partitions
is not available in the optimization phase but in the execution phase.
SupportsPartitionPushDown, the conditions in the WHERE clause are analyzed to
determine in advance which partitions can be safely skipped in the optimization phase. For such
queries, the specific partitions is not available in the optimization phase but in the execution
By default, if this interface is not implemented, the data is read entirely with a subsequent filter operation after the source.
If this interface is implemented, this interface just tells the source which fields can be applied for filtering and the source needs to pick the fields that can be supported and return them to planner. Then the planner will build the plan and construct the operator which will send the data to the source in runtime.
In the future, more flexible filtering can be pushed into the source connectors through this interface.
|Modifier and Type||Method and Description|
Applies the candidate filter fields into the table source.
Return the filter fields this partition table source supported.
void applyDynamicFiltering(List<String> candidateFilterFields)
NOTE: the candidate filter fields are always from the result of
Copyright © 2014–2023 The Apache Software Foundation. All rights reserved.