Class Schema.Builder

  • Enclosing class:
    Schema

    @PublicEvolving
    public static final class Schema.Builder
    extends Object
    A builder for constructing an immutable but still unresolved Schema.
    • Method Detail

      • fromSchema

        public Schema.Builder fromSchema​(Schema unresolvedSchema)
        Adopts all members from the given unresolved schema.
      • fromResolvedSchema

        public Schema.Builder fromResolvedSchema​(ResolvedSchema resolvedSchema)
        Adopts all members from the given resolved schema.
      • fromRowDataType

        public Schema.Builder fromRowDataType​(DataType dataType)
        Adopts all fields of the given row as physical columns of the schema.
      • fromFields

        public Schema.Builder fromFields​(String[] fieldNames,
                                         AbstractDataType<?>[] fieldDataTypes)
        Adopts the given field names and field data types as physical columns of the schema.
      • column

        public Schema.Builder column​(String columnName,
                                     AbstractDataType<?> dataType)
        Declares a physical column that is appended to this schema.

        Physical columns are regular columns known from databases. They define the names, the types, and the order of fields in the physical data. Thus, physical columns represent the payload that is read from and written to an external system. Connectors and formats use these columns (in the defined order) to configure themselves. Other kinds of columns can be declared between physical columns but will not influence the final physical schema.

        Parameters:
        columnName - column name
        dataType - data type of the column
      • columnByExpression

        public Schema.Builder columnByExpression​(String columnName,
                                                 Expression expression)
        Declares a computed column that is appended to this schema.

        Computed columns are virtual columns that are generated by evaluating an expression that can reference other columns declared in the same table. Both physical columns and metadata columns can be accessed. The column itself is not physically stored within the table. The column’s data type is derived automatically from the given expression and does not have to be declared manually.

        Computed columns are commonly used for defining time attributes. For example, the computed column can be used if the original field is not TIMESTAMP(3) type or is nested in a JSON string.

        Any scalar expression can be used for in-memory/temporary tables. However, currently, only SQL expressions can be persisted in a catalog. User-defined functions (also defined in different catalogs) are supported.

        Example: .columnByExpression("ts", $("json_obj").get("ts").cast(TIMESTAMP(3))

        Parameters:
        columnName - column name
        expression - computation of the column
      • columnByExpression

        public Schema.Builder columnByExpression​(String columnName,
                                                 String sqlExpression)
        Declares a computed column that is appended to this schema.

        See columnByExpression(String, Expression) for a detailed explanation.

        This method uses a SQL expression that can be easily persisted in a durable catalog.

        Example: .columnByExpression("ts", "CAST(json_obj.ts AS TIMESTAMP(3))")

        Parameters:
        columnName - column name
        sqlExpression - computation of the column using SQL
      • columnByMetadata

        public Schema.Builder columnByMetadata​(String columnName,
                                               AbstractDataType<?> dataType)
        Declares a metadata column that is appended to this schema.

        Metadata columns allow to access connector and/or format specific fields for every row of a table. For example, a metadata column can be used to read and write the timestamp from and to Kafka records for time-based operations. The connector and format documentation lists the available metadata fields for every component.

        Every metadata field is identified by a string-based key and has a documented data type. For convenience, the runtime will perform an explicit cast if the data type of the column differs from the data type of the metadata field. Of course, this requires that the two data types are compatible.

        Note: This method assumes that the metadata key is equal to the column name and the metadata column can be used for both reading and writing.

        Parameters:
        columnName - column name
        dataType - data type of the column
      • columnByMetadata

        public Schema.Builder columnByMetadata​(String columnName,
                                               String serializableTypeString)
        Declares a metadata column that is appended to this schema.

        See column(String, AbstractDataType) for a detailed explanation.

        This method uses a type string that can be easily persisted in a durable catalog.

        Parameters:
        columnName - column name
        serializableTypeString - data type of the column
      • columnByMetadata

        public Schema.Builder columnByMetadata​(String columnName,
                                               AbstractDataType<?> dataType,
                                               boolean isVirtual)
        Declares a metadata column that is appended to this schema.

        Metadata columns allow to access connector and/or format specific fields for every row of a table. For example, a metadata column can be used to read and write the timestamp from and to Kafka records for time-based operations. The connector and format documentation lists the available metadata fields for every component.

        Every metadata field is identified by a string-based key and has a documented data type. For convenience, the runtime will perform an explicit cast if the data type of the column differs from the data type of the metadata field. Of course, this requires that the two data types are compatible.

        By default, a metadata column can be used for both reading and writing. However, in many cases an external system provides more read-only metadata fields than writable fields. Therefore, it is possible to exclude metadata columns from persisting by setting the isVirtual flag to true.

        Note: This method assumes that the metadata key is equal to the column name.

        Parameters:
        columnName - column name
        dataType - data type of the column
        isVirtual - whether the column should be persisted or not
      • columnByMetadata

        public Schema.Builder columnByMetadata​(String columnName,
                                               AbstractDataType<?> dataType,
                                               @Nullable
                                               String metadataKey)
        Declares a metadata column that is appended to this schema.

        Metadata columns allow to access connector and/or format specific fields for every row of a table. For example, a metadata column can be used to read and write the timestamp from and to Kafka records for time-based operations. The connector and format documentation lists the available metadata fields for every component.

        Every metadata field is identified by a string-based key and has a documented data type. The metadata key can be omitted if the column name should be used as the identifying metadata key. For convenience, the runtime will perform an explicit cast if the data type of the column differs from the data type of the metadata field. Of course, this requires that the two data types are compatible.

        Note: This method assumes that a metadata column can be used for both reading and writing.

        Parameters:
        columnName - column name
        dataType - data type of the column
        metadataKey - identifying metadata key, if null the column name will be used as metadata key
      • columnByMetadata

        public Schema.Builder columnByMetadata​(String columnName,
                                               AbstractDataType<?> dataType,
                                               @Nullable
                                               String metadataKey,
                                               boolean isVirtual)
        Declares a metadata column that is appended to this schema.

        Metadata columns allow to access connector and/or format specific fields for every row of a table. For example, a metadata column can be used to read and write the timestamp from and to Kafka records for time-based operations. The connector and format documentation lists the available metadata fields for every component.

        Every metadata field is identified by a string-based key and has a documented data type. The metadata key can be omitted if the column name should be used as the identifying metadata key. For convenience, the runtime will perform an explicit cast if the data type of the column differs from the data type of the metadata field. Of course, this requires that the two data types are compatible.

        By default, a metadata column can be used for both reading and writing. However, in many cases an external system provides more read-only metadata fields than writable fields. Therefore, it is possible to exclude metadata columns from persisting by setting the isVirtual flag to true.

        Parameters:
        columnName - column name
        dataType - data type of the column
        metadataKey - identifying metadata key, if null the column name will be used as metadata key
        isVirtual - whether the column should be persisted or not
      • columnByMetadata

        public Schema.Builder columnByMetadata​(String columnName,
                                               String serializableTypeString,
                                               @Nullable
                                               String metadataKey,
                                               boolean isVirtual)
        Declares a metadata column that is appended to this schema.

        See columnByMetadata(String, AbstractDataType, String, boolean) for a detailed explanation.

        This method uses a type string that can be easily persisted in a durable catalog.

        Parameters:
        columnName - column name
        serializableTypeString - data type of the column
        metadataKey - identifying metadata key, if null the column name will be used as metadata key
        isVirtual - whether the column should be persisted or not
      • withComment

        public Schema.Builder withComment​(@Nullable
                                          String comment)
        Apply comment to the previous column.
      • watermark

        public Schema.Builder watermark​(String columnName,
                                        Expression watermarkExpression)
        Declares that the given column should serve as an event-time (i.e. rowtime) attribute and specifies a corresponding watermark strategy as an expression.

        The column must be of type TIMESTAMP(3) or TIMESTAMP_LTZ(3) and be a top-level column in the schema. It may be a computed column.

        The watermark generation expression is evaluated by the framework for every record during runtime. The framework will periodically emit the largest generated watermark. If the current watermark is still identical to the previous one, or is null, or the value of the returned watermark is smaller than that of the last emitted one, then no new watermark will be emitted. A watermark is emitted in an interval defined by the configuration.

        Any scalar expression can be used for declaring a watermark strategy for in-memory/temporary tables. However, currently, only SQL expressions can be persisted in a catalog. The expression's return data type must be TIMESTAMP(3). User-defined functions (also defined in different catalogs) are supported.

        Example: .watermark("ts", $("ts).minus(lit(5).seconds())

        Parameters:
        columnName - the column name used as a rowtime attribute
        watermarkExpression - the expression used for watermark generation
      • watermark

        public Schema.Builder watermark​(String columnName,
                                        String sqlExpression)
        Declares that the given column should serve as an event-time (i.e. rowtime) attribute and specifies a corresponding watermark strategy as an expression.

        See watermark(String, Expression) for a detailed explanation.

        This method uses a SQL expression that can be easily persisted in a durable catalog.

        Example: .watermark("ts", "ts - INTERVAL '5' SECOND")

      • primaryKey

        public Schema.Builder primaryKey​(String... columnNames)
        Declares a primary key constraint for a set of given columns. Primary key uniquely identify a row in a table. Neither of columns in a primary can be nullable. The primary key is informational only. It will not be enforced. It can be used for optimizations. It is the data owner's responsibility to ensure uniqueness of the data.

        The primary key will be assigned a random name.

        Parameters:
        columnNames - columns that form a unique primary key
      • primaryKey

        public Schema.Builder primaryKey​(List<String> columnNames)
        Declares a primary key constraint for a set of given columns. Primary key uniquely identify a row in a table. Neither of columns in a primary can be nullable. The primary key is informational only. It will not be enforced. It can be used for optimizations. It is the data owner's responsibility to ensure uniqueness of the data.

        The primary key will be assigned a generated name in the format PK_col1_col2.

        Parameters:
        columnNames - columns that form a unique primary key
      • primaryKeyNamed

        public Schema.Builder primaryKeyNamed​(String constraintName,
                                              String... columnNames)
        Declares a primary key constraint for a set of given columns. Primary key uniquely identify a row in a table. Neither of columns in a primary can be nullable. The primary key is informational only. It will not be enforced. It can be used for optimizations. It is the data owner's responsibility to ensure uniqueness of the data.
        Parameters:
        constraintName - name for the primary key, can be used to reference the constraint
        columnNames - columns that form a unique primary key
      • primaryKeyNamed

        public Schema.Builder primaryKeyNamed​(String constraintName,
                                              List<String> columnNames)
        Declares a primary key constraint for a set of given columns. Primary key uniquely identify a row in a table. Neither of columns in a primary can be nullable. The primary key is informational only. It will not be enforced. It can be used for optimizations. It is the data owner's responsibility to ensure uniqueness of the data.
        Parameters:
        constraintName - name for the primary key, can be used to reference the constraint
        columnNames - columns that form a unique primary key
      • build

        public Schema build()
        Returns an instance of an unresolved Schema.