Parquet Format

This documentation is for an out-of-date version of Apache Flink. We recommend you use the latest stable version.

Format: Serialization Schema Format: Deserialization Schema

Dependencies
How to create a table with Parquet format
Format Options
Data Type Mapping

The Apache Parquet format allows to read and write Parquet data.

Dependencies

In order to setup the Parquet format, the following table provides dependency information for both projects using a build automation tool (such as Maven or SBT) and SQL Client with SQL JAR bundles.

Maven dependency	SQL Client JAR
flink-parquet_2.11	Download

How to create a table with Parquet format

Here is an example to create a table using Filesystem connector and Parquet format.

CREATE TABLE user_behavior (
  user_id BIGINT,
  item_id BIGINT,
  category_id BIGINT,
  behavior STRING,
  ts TIMESTAMP(3),
  dt STRING
) PARTITIONED BY (dt) WITH (
 'connector' = 'filesystem',
 'path' = '/tmp/user_behavior',
 'format' = 'parquet'
)

Format Options

Option	Required	Default	Type	Description
format	required	(none)	String	Specify what format to use, here should be 'parquet'.
parquet.utc-timezone	optional	false	Boolean	Use UTC timezone or local timezone to the conversion between epoch time and LocalDateTime. Hive 0.x/1.x/2.x use local timezone. But Hive 3.x use UTC timezone.

Parquet format also supports configuration from ParquetOutputFormat. For example, you can configure parquet.compression=GZIP to enable gzip compression.

Data Type Mapping

Currently, Parquet format type mapping is compatible with Apache Hive, but different with Apache Spark:

Timestamp: mapping timestamp type to int96 whatever the precision is.
Decimal: mapping decimal type to fixed length byte array according to the precision.

The following table lists the type mapping from Flink type to Parquet type.

Flink Data Type	Parquet type	Parquet logical type
CHAR / VARCHAR / STRING	BINARY	UTF8
BOOLEAN	BOOLEAN
BINARY / VARBINARY	BINARY
DECIMAL	FIXED_LEN_BYTE_ARRAY	DECIMAL
TINYINT	INT32	INT_8
SMALLINT	INT32	INT_16
INT	INT32
BIGINT	INT64
FLOAT	FLOAT
DOUBLE	DOUBLE
DATE	INT32	DATE
TIME	INT32	TIME_MILLIS
TIMESTAMP	INT96

Attention Composite data type: Array, Map and Row are not supported.

Parquet Format

Dependencies

How to create a table with Parquet format

Format Options

format

parquet.utc-timezone

Data Type Mapping