Introduction
This documentation is for an unreleased version of Apache Flink CDC. We recommend you use the latest stable version.

Welcome to Flink CDC ๐ŸŽ‰ #

Flink CDC is a streaming data integration tool that aims to provide users with a more robust API. It allows users to describe their ETL pipeline logic via YAML elegantly and help users automatically generating customized Flink operators and submitting job. Flink CDC prioritizes optimizing the task submission process and offers enhanced functionalities such as schema evolution, data transformation, full database synchronization and exactly-once semantic.

Deeply integrated with and powered by Apache Flink, Flink CDC provides:

  • โœ… End-to-end data integration framework
  • โœ… API for data integration users to build jobs easily
  • โœ… Multi-table support in Source / Sink
  • โœ… Synchronization of entire databases
  • โœ… Schema evolution capability

Flink CDC provides an YAML-formatted user API that more suitable for data integration scenarios. Here’s an example YAML file defining a data pipeline that ingests real-time changes from MySQL, and synchronize them to Apache Doris:

source:
  type: mysql
  hostname: localhost
  port: 3306
  username: root
  password: 123456
  tables: app_db.\.*
  server-id: 5400-5404
  server-time-zone: UTC

sink:
  type: doris
  fenodes: 127.0.0.1:8030
  username: root
  password: ""
  table.create.properties.light_schema_change: true
  table.create.properties.replication_num: 1

pipeline:
  name: Sync MySQL Database to Doris
  parallelism: 2

By submitting the YAML file with flink-cdc.sh, a Flink job will be compiled and deployed to a designated Flink cluster. Please refer to Core Concept to get full documentation of all supported functionalities of a pipeline.

Explore Flink CDC document to get hands on your first real-time data integration pipeline:

Quickstart #

Check out the quickstart guide to learn how to establish a Flink CDC pipeline:

Understand Core Concepts #

Get familiar with core concepts we introduced in Flink CDC and try to build more complex pipelines:

Learn how to submit the pipeline to Flink cluster running on different deployment mode:

Development and Contribution #

If you want to connect Flink CDC to your customized external system, or contributing to the framework itself, these sections could be helpful: