Format: Serialization Schema Format: Deserialization Schema
The Apache Avro format allows to read and write Avro data based on an Avro schema. Currently, the Avro schema is derived from table schema.
In order to setup the Avro format, the following table provides dependency information for both projects using a build automation tool (such as Maven or SBT) and SQL Client with SQL JAR bundles.
You can download flink-avro from Download, and requires additional Hadoop dependency for cluster execution.
Here is an example to create a table using Kafka connector and Avro format.
Option | Required | Default | Type | Description |
---|---|---|---|---|
format |
required | (none) | String | Specify what format to use, here should be 'avro' . |
avro.codec |
optional | (none) | String | For Filesystem only, the compression codec for avro. No compression as default. The valid enumerations are: deflate, snappy, bzip2, xz. |
Currently, the Avro schema is always derived from table schema. Explicitly defining an Avro schema is not supported yet. So the following table lists the type mapping from Flink type to Avro type.
Flink SQL type | Avro type | Avro logical type |
---|---|---|
CHAR / VARCHAR / STRING | string | |
BOOLEAN |
boolean |
|
BINARY / VARBINARY |
bytes |
|
DECIMAL |
fixed |
decimal |
TINYINT |
int |
|
SMALLINT |
int |
|
INT |
int |
|
BIGINT |
long |
|
FLOAT |
float |
|
DOUBLE |
double |
|
DATE |
int |
date |
TIME |
int |
time-millis |
TIMESTAMP |
long |
timestamp-millis |
ARRAY |
array |
|
MAP (key must be string/char/varchar type) |
map |
|
MULTISET (element must be string/char/varchar type) |
map |
|
ROW |
record |
In addition to the types listed above, Flink supports reading/writing nullable types. Flink maps nullable types to Avro union(something, null)
, where something
is the Avro type converted from Flink type.
You can refer to Avro Specification for more information about Avro types.