Orc
This documentation is for an unreleased version of Apache Flink. We recommend you use the latest stable version.

Orc Format #

Format: Serialization Schema Format: Deserialization Schema

The Apache Orc format allows to read and write Orc data.

Dependencies #

In order to use the ORC format the following dependencies are required for both projects using a build automation tool (such as Maven or SBT) and SQL Client with SQL JAR bundles.

Maven dependency SQL Client
Only available for stable releases.

How to create a table with Orc format #

Here is an example to create a table using Filesystem connector and Orc format.

CREATE TABLE user_behavior (
  user_id BIGINT,
  item_id BIGINT,
  category_id BIGINT,
  behavior STRING,
  ts TIMESTAMP(3),
  dt STRING
) PARTITIONED BY (dt) WITH (
 'connector' = 'filesystem',
 'path' = '/tmp/user_behavior',
 'format' = 'orc'
)

Format Options #

Option Required Default Type Description
format
required (none) String Specify what format to use, here should be 'orc'.

Orc format also supports table properties from Table properties. For example, you can configure orc.compress=SNAPPY to enable snappy compression.

Data Type Mapping #

Orc format type mapping is compatible with Apache Hive. The following table lists the type mapping from Flink type to Orc type.

Flink Data Type Orc physical type Orc logical type
CHAR bytes CHAR
VARCHAR bytes VARCHAR
STRING bytes STRING
BOOLEAN long BOOLEAN
BYTES bytes BINARY
DECIMAL decimal DECIMAL
TINYINT long BYTE
SMALLINT long SHORT
INT long INT
BIGINT long LONG
FLOAT double FLOAT
DOUBLE double DOUBLE
DATE long DATE
TIMESTAMP timestamp TIMESTAMP
ARRAY - LIST
MAP - MAP
ROW - STRUCT