Schema Evolution

Definition #

Schema Evolution feature could synchronize upstream schema DDL changes to downstream, including creating new table, appending new columns, renaming columns or changing column types, and dropping columns.

Parameters #

Schema evolution behavior could be specified with the following pipeline option:

pipeline:
  schema.change.behavior: evolve

schema.change.behavior is of enum type, and could be set to exception, evolve, try_evolve, lenient or ignore.

Behaviors #

Exception Mode #

In this mode, all schema change behaviors are forbidden. An exception will be thrown from SchemaOperator once it was captured. This is useful when your downstream sink is not expected to handle any schema changes.

Evolve Mode #

In this mode, CDC pipeline schema operator will apply all upstream schema change events to downstream sink. If the attempt fails, an exception will be thrown from the SchemaRegistry and trigger a global failover.

TryEvolve Mode #

In this mode, schema operator will also try to apply upstream schema change events to downstream sink. However, if specific schema change events are not supported by downstream sink, the failure will be tolerated and SchemaOperator will try to convert all following data records in case of schema discrepancy.

Warning: such data casting and converting isn’t guaranteed to be lossless. Some fields with incompatible data types might be lost.

Lenient Mode #

In this mode, schema operator will convert all upstream schema change events to downstream sink after converting them to ensure no data will be lost. For example, an AlterColumnTypeEvent will be converted to two individual schema change events including RenameColumnEvent and AddColumnEvent: Previous column (with the unchanged type) will be kept and a new column (with the new type) will be added.

This is the default schema evolution behavior.

Ignore Mode #

In this mode, all schema change events will be silently swallowed by SchemaOperator and never attempt to apply them to downstream sink. This is useful when your downstream sink is unready for any schema changes, but wants to keep receiving data from unchanged columns.

Per-Event Type Control #

Sometimes, it may not be suitable to synchronize all schema change events to downstream. For example, allowing AddColumnEvent but disallowing DropColumnEvent is a common scenario to avoid deleting existing data. This could be achieved by setting include.schema.changes and exclude.schema.changes option in sink block.

Options #

Option Key meaning optional/required
include.schema.changes Schema change event types to be included. Include all types by default if not specified. optional
exclude.schema.changes Schema change event types not to be included. It has a higher priority than include.schema.changes. optional

Here’s a full list of configurable schema change event types:

Event Type Description
add.column Add a new column to a table.
alter.column.type Change the type of column.
create.table Create a new table.
drop.column Drop a column.
rename.column Rename a column.

Partial matching is supported. For example, passing column into the options above is equivalent to passing add.column, alter.column.type, drop.column, and rename.column.

Example #

The following YAML configuration is set to include CreateTableEvent and column related events, except DropColumnEvent.

sink:
  include.schema.changes: [create.table, column] # This matches CreateTable, AddColumn, AlterColumnType, RenameColumn, and DropColumn Events
  exclude.schema.changes: [drop.column] # This excludes DropColumn Events