@PublicEvolving public interface SupportsReadingMetadata
ScanTableSource
s that support reading metadata columns.
Metadata columns add additional columns to the table's schema. A table source is responsible for adding requested metadata columns at the end of produced rows. This includes potentially forwarding metadata columns from contained formats.
Examples in SQL look like:
// reads the column from corresponding metadata key `timestamp`
CREATE TABLE t1 (i INT, s STRING, timestamp TIMESTAMP(3) WITH LOCAL TIME ZONE METADATA, d DOUBLE)
// reads the column from metadata key `timestamp` and casts to INT
CREATE TABLE t2 (i INT, s STRING, myTimestamp INT METADATA FROM 'timestamp', d DOUBLE)
By default, if this interface is not implemented, the statements above would fail because the table source does not provide a metadata key called `timestamp`.
If this interface is implemented, listReadableMetadata()
lists all metadata keys and
their corresponding data types that the source exposes to the planner. The planner will use this
information for validation and insertion of explicit casts if necessary.
The planner will select required metadata columns (i.e. perform projection push down) and will
call applyReadableMetadata(List, DataType)
with a list of metadata keys. An
implementation must ensure that metadata columns are appended at the end of the physical row in
the order of the provided list after the apply method has been called.
Note: The final output data type emitted by a source changes from the physically produced data
type to a data type with metadata columns. applyReadableMetadata(List, DataType)
will
pass the updated data type for convenience. If a source implements SupportsProjectionPushDown
, the projection must be applied to the physical data in the first
step. The passed updated data type will have considered information from SupportsProjectionPushDown
already.
The metadata column's data type must match with listReadableMetadata()
. For the
examples above, this means that a table source for `t2` returns a TIMESTAMP and not INT. The
casting to INT will be performed by the planner in a subsequent operation:
// for t1 and t2
ROW < i INT, s STRING, d DOUBLE > // physical output
ROW < i INT, s STRING, d DOUBLE, timestamp TIMESTAMP(3) WITH LOCAL TIME ZONE > // final output
Modifier and Type | Method and Description |
---|---|
void |
applyReadableMetadata(List<String> metadataKeys,
DataType producedDataType)
Provides a list of metadata keys that the produced
RowData must contain as appended
metadata columns. |
Map<String,DataType> |
listReadableMetadata()
Returns the map of metadata keys and their corresponding data types that can be produced by
this table source for reading.
|
Map<String,DataType> listReadableMetadata()
The returned map will be used by the planner for validation and insertion of explicit
casts (see LogicalTypeCasts.supportsExplicitCast(LogicalType, LogicalType)
) if
necessary.
The iteration order of the returned map determines the order of metadata keys in the list
passed in applyReadableMetadata(List, DataType)
. Therefore, it might be beneficial
to return a LinkedHashMap
if a strict metadata column order is required.
If a source forwards metadata from one or more formats, we recommend the following column order for consistency:
KEY FORMAT METADATA COLUMNS + VALUE FORMAT METADATA COLUMNS + SOURCE METADATA COLUMNS
Metadata key names follow the same pattern as mentioned in Factory
. In case of
duplicate names in format and source keys, format keys shall have higher precedence.
Regardless of the returned DataType
s, a metadata column is always represented
using internal data structures (see RowData
).
DecodingFormat.listReadableMetadata()
void applyReadableMetadata(List<String> metadataKeys, DataType producedDataType)
RowData
must contain as appended
metadata columns.
Note: Use the passed data type instead of TableSchema.toPhysicalRowDataType()
for
describing the final output data type when creating TypeInformation
. If the source
implements SupportsProjectionPushDown
, the projection is already considered in the
given output data type.
metadataKeys
- a subset of the keys returned by listReadableMetadata()
, ordered
by the iteration order of returned mapproducedDataType
- the final output type of the sourceDecodingFormat.applyReadableMetadata(List)
Copyright © 2014–2022 The Apache Software Foundation. All rights reserved.