This documentation is for an out-of-date version of Apache Flink. We recommend you use the latest stable version.

Table API

The Table API is a unified, relational API for stream and batch processing. Table API queries can be run on batch or streaming input without modifications. The Table API is a super set of the SQL language and is specially designed for working with Apache Flink. The Table API is a language-integrated API for Scala and Java. Instead of specifying queries as String values as common with SQL, Table API queries are defined in a language-embedded style in Java or Scala with IDE support like autocompletion and syntax validation.

The Table API shares many concepts and parts of its API with Flinkā€™s SQL integration. Have a look at the Common Concepts & API to learn how to register tables or to create a Table object. The Streaming Concepts page discusses streaming specific concepts such as dynamic tables and time attributes.

The following examples assume a registered table called Orders with attributes (a, b, c, rowtime). The rowtime field is either a logical time attribute in streaming or a regular timestamp field in batch.

Overview & Examples

The Table API is available for Scala and Java. The Scala Table API leverages on Scala expressions, the Java Table API is based on strings which are parsed and converted into equivalent expressions.

The following example shows the differences between the Scala and Java Table API. The table program is executed in a batch environment. It scans the Orders table, groups by field a, and counts the resulting rows per group. The result of the table program is converted into a DataSet of type Row and printed.

The Java Table API is enabled by importing org.apache.flink.table.api.java.*. The following example shows how a Java Table API program is constructed and how expressions are specified as strings.

// environment configuration
ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
BatchTableEnvironment tEnv = TableEnvironment.getTableEnvironment(env);

// register Orders table in table environment
// ...

// specify table program
Table orders = tEnv.scan("Orders"); // schema (a, b, c, rowtime)

Table counts = orders
        .groupBy("a")
        .select("a, b.count as cnt");

// conversion to DataSet
DataSet<Row> result = tableEnv.toDataSet(counts, Row.class);
result.print();

The Scala Table API is enabled by importing org.apache.flink.api.scala._ and org.apache.flink.table.api.scala._.

The following example shows how a Scala Table API program is constructed. Table attributes are referenced using Scala Symbols, which start with an apostrophe character (').

import org.apache.flink.api.scala._
import org.apache.flink.table.api.scala._

// environment configuration
val env = ExecutionEnvironment.getExecutionEnvironment
val tEnv = TableEnvironment.getTableEnvironment(env)

// register Orders table in table environment
// ...

// specify table program
val orders = tEnv.scan("Orders") // schema (a, b, c, rowtime)

val result = orders
               .groupBy('a)
               .select('a, 'b.count as 'cnt)
               .toDataSet[Row] // conversion to DataSet
               .print()

The next example shows a more complex Table API program. The program scans again the Orders table. It filters null values, normalizes the field a of type String, and calculates for each hour and product a the average billing amount b.

// environment configuration
// ...

// specify table program
Table orders = tEnv.scan("Orders"); // schema (a, b, c, rowtime)

Table result = orders
        .filter("a.isNotNull && b.isNotNull && c.isNotNull")
        .select("a.lowerCase(), b, rowtime")
        .window(Tumble.over("1.hour").on("rowtime").as("hourlyWindow"))
        .groupBy("hourlyWindow, a")
        .select("a, hourlyWindow.end as hour, b.avg as avgBillingAmount");
// environment configuration
// ...

// specify table program
val orders: Table = tEnv.scan("Orders") // schema (a, b, c, rowtime)

val result: Table = orders
        .filter('a.isNotNull && 'b.isNotNull && 'c.isNotNull)
        .select('a.lowerCase(), 'b, 'rowtime)
        .window(Tumble over 1.hour on 'rowtime as 'hourlyWindow)
        .groupBy('hourlyWindow, 'a)
        .select('a, 'hourlyWindow.end as 'hour, 'b.avg as 'avgBillingAmount)

Since the Table API is a unified API for batch and streaming data, both example programs can be executed on batch and streaming inputs without any modification of the table program itself. In both cases, the program produces the same results given that streaming records are not late (see Streaming Concepts for details).

Back to top

Operations

The Table API supports the following operations. Please note that not all operations are available in both batch and streaming yet; they are tagged accordingly.

Scan, Projection, and Filter

Operators Description
Scan
Batch Streaming

Similar to the FROM clause in a SQL query. Performs a scan of a registered table.

Table orders = tableEnv.scan("Orders");
Select
Batch Streaming

Similar to a SQL SELECT statement. Performs a select operation.

Table orders = tableEnv.scan("Orders");
Table result = orders.select("a, c as d");

You can use star (*) to act as a wild card, selecting all of the columns in the table.

Table result = orders.select("*");
As
Batch Streaming

Renames fields.

Table orders = tableEnv.scan("Orders");
Table result = orders.as("x, y, z, t");
Where / Filter
Batch Streaming

Similar to a SQL WHERE clause. Filters out rows that do not pass the filter predicate.

Table orders = tableEnv.scan("Orders");
Table result = orders.where("b === 'red'");
or
Table orders = tableEnv.scan("Orders");
Table result = orders.filter("a % 2 === 0");
Operators Description
Scan
Batch Streaming

Similar to the FROM clause in a SQL query. Performs a scan of a registered table.

val orders: Table = tableEnv.scan("Orders")
Select
Batch Streaming

Similar to a SQL SELECT statement. Performs a select operation.

val orders: Table = tableEnv.scan("Orders")
val result = orders.select('a, 'c as 'd)

You can use star (*) to act as a wild card, selecting all of the columns in the table.

val orders: Table = tableEnv.scan("Orders")
val result = orders.select('*)
As
Batch Streaming

Renames fields.

val orders: Table = tableEnv.scan("Orders").as('x, 'y, 'z, 't)
Where / Filter
Batch Streaming

Similar to a SQL WHERE clause. Filters out rows that do not pass the filter predicate.

val orders: Table = tableEnv.scan("Orders")
val result = orders.filter('a % 2 === 0)
or
val orders: Table = tableEnv.scan("Orders")
val result = orders.where('b === "red")

Back to top

Aggregations

Operators Description
GroupBy Aggregation
Batch Streaming
Result Updating

Similar to a SQL GROUP BY clause. Groups the rows on the grouping keys with a following running aggregation operator to aggregate rows group-wise.

Table orders = tableEnv.scan("Orders");
Table result = orders.groupBy("a").select("a, b.sum as d");

Note: For streaming queries the required state to compute the query result might grow infinitely depending on the type of aggregation and the number of distinct grouping keys. Please provide a query configuration with valid retention interval to prevent excessive state size. See Streaming Concepts for details.

GroupBy Window Aggregation
Batch Streaming

Groups and aggregates a table on a group window and possibly one or more grouping keys.

Table orders = tableEnv.scan("Orders");
Table result = orders
    .window(Tumble.over("5.minutes").on("rowtime").as("w")) // define window
    .groupBy("a, w") // group by key and window
    .select("a, w.start, w.end, w.rowtime, b.sum as d"); // access window properties and aggregate
Over Window Aggregation
Streaming

Similar to a SQL OVER clause. Over window aggregates are computed for each row, based on a window (range) of preceding and succeeding rows. See the over windows section for more details.

Table orders = tableEnv.scan("Orders");
Table result = orders
    // define window
    .window(Over  
      .partitionBy("a")
      .orderBy("rowtime")
      .preceding("UNBOUNDED_RANGE")
      .following("CURRENT_RANGE")
      .as("w"))
    .select("a, b.avg over w, b.max over w, b.min over w"); // sliding aggregate

Note: All aggregates must be defined over the same window, i.e., same partitioning, sorting, and range. Currently, only windows with PRECEDING (UNBOUNDED and bounded) to CURRENT ROW range are supported. Ranges with FOLLOWING are not supported yet. ORDER BY must be specified on a single time attribute.

Distinct
Batch Streaming
Result Updating

Similar to a SQL DISTINCT clause. Returns records with distinct value combinations.

Table orders = tableEnv.scan("Orders");
Table result = orders.distinct();

Note: For streaming queries the required state to compute the query result might grow infinitely depending on the number of distinct fields. Please provide a query configuration with valid retention interval to prevent excessive state size. See Streaming Concepts for details.

Operators Description
GroupBy Aggregation
Batch Streaming
Result Updating

Similar to a SQL GROUP BY clause. Groups the rows on the grouping keys with a following running aggregation operator to aggregate rows group-wise.

val orders: Table = tableEnv.scan("Orders")
val result = orders.groupBy('a).select('a, 'b.sum as 'd)

Note: For streaming queries the required state to compute the query result might grow infinitely depending on the type of aggregation and the number of distinct grouping keys. Please provide a query configuration with valid retention interval to prevent excessive state size. See Streaming Concepts for details.

GroupBy Window Aggregation
Batch Streaming

Groups and aggregates a table on a group window and possibly one or more grouping keys.

val orders: Table = tableEnv.scan("Orders")
val result: Table = orders
    .window(Tumble over 5.minutes on 'rowtime as 'w) // define window
    .groupBy('a, 'w) // group by key and window
    .select('a, w.start, 'w.end, 'w.rowtime, 'b.sum as 'd) // access window properties and aggregate
Over Window Aggregation
Streaming

Similar to a SQL OVER clause. Over window aggregates are computed for each row, based on a window (range) of preceding and succeeding rows. See the over windows section for more details.

val orders: Table = tableEnv.scan("Orders")
val result: Table = orders
    // define window
    .window(Over  
      partitionBy 'a
      orderBy 'rowtime
      preceding UNBOUNDED_RANGE
      following CURRENT_RANGE
      as 'w)
    .select('a, 'b.avg over 'w, 'b.max over 'w, 'b.min over 'w) // sliding aggregate

Note: All aggregates must be defined over the same window, i.e., same partitioning, sorting, and range. Currently, only windows with PRECEDING (UNBOUNDED and bounded) to CURRENT ROW range are supported. Ranges with FOLLOWING are not supported yet. ORDER BY must be specified on a single time attribute.

Distinct
Batch

Similar to a SQL DISTINCT clause. Returns records with distinct value combinations.

val orders: Table = tableEnv.scan("Orders")
val result = orders.distinct()

Note: For streaming queries the required state to compute the query result might grow infinitely depending on the number of distinct fields. Please provide a query configuration with valid retention interval to prevent excessive state size. See Streaming Concepts for details.

Back to top

Joins

Operators Description
Inner Join
Batch Streaming

Similar to a SQL JOIN clause. Joins two tables. Both tables must have distinct field names and at least one equality join predicate must be defined through join operator or using a where or filter operator.

Table left = tableEnv.fromDataSet(ds1, "a, b, c");
Table right = tableEnv.fromDataSet(ds2, "d, e, f");
Table result = left.join(right).where("a = d").select("a, b, e");

Note: For streaming queries the required state to compute the query result might grow infinitely depending on the number of distinct input rows. Please provide a query configuration with valid retention interval to prevent excessive state size. See Streaming Concepts for details.

Outer Joins
Batch

Similar to SQL LEFT/RIGHT/FULL OUTER JOIN clauses. Joins two tables. Both tables must have distinct field names and at least one equality join predicate must be defined.

Table left = tableEnv.fromDataSet(ds1, "a, b, c");
Table right = tableEnv.fromDataSet(ds2, "d, e, f");

Table leftOuterResult = left.leftOuterJoin(right, "a = d").select("a, b, e");
Table rightOuterResult = left.rightOuterJoin(right, "a = d").select("a, b, e");
Table fullOuterResult = left.fullOuterJoin(right, "a = d").select("a, b, e");
Time-windowed Join
Batch Streaming

Note: Time-windowed joins are a subset of regular joins that can be processed in a streaming fashion.

A time-windowed join requires at least one equi-join predicate and a join condition that bounds the time on both sides. Such a condition can be defined by two appropriate range predicates (<, <=, >=, >) or a single equality predicate that compares time attributes of the same type (i.e., processing time or event time) of both input tables.

For example, the following predicates are valid window join conditions:

  • ltime === rtime
  • ltime >= rtime && ltime < rtime + 10.minutes
Table left = tableEnv.fromDataSet(ds1, "a, b, c, ltime.rowtime");
Table right = tableEnv.fromDataSet(ds2, "d, e, f, rtime.rowtime");

Table result = left.join(right)
  .where("a = d && ltime >= rtime - 5.minutes && ltime < rtime + 10.minutes")
  .select("a, b, e, ltime");
TableFunction Inner Join
Batch Streaming

Joins a table with a the results of a table function. Each row of the left (outer) table is joined with all rows produced by the corresponding call of the table function. A row of the left (outer) table is dropped, if its table function call returns an empty result.

// register function
TableFunction<String> split = new MySplitUDTF();
tEnv.registerFunction("split", split);

// join
Table orders = tableEnv.scan("Orders");
Table result = orders
    .join(new Table(tEnv, "split(c)").as("s", "t", "v"))
    .select("a, b, s, t, v");
TableFunction Left Outer Join
Batch Streaming

Joins a table with a the results of a table function. Each row of the left (outer) table is joined with all rows produced by the corresponding call of the table function. If a table function call returns an empty result, the corresponding outer row is preserved and the result padded with null values.

Note: Currently, the predicate of a table function left outer join can only be empty or literal true.

// register function
TableFunction<String> split = new MySplitUDTF();
tEnv.registerFunction("split", split);

// join
Table orders = tableEnv.scan("Orders");
Table result = orders
    .leftOuterJoin(new Table(tEnv, "split(c)").as("s", "t", "v"))
    .select("a, b, s, t, v");
Operators Description
Inner Join
Batch

Similar to a SQL JOIN clause. Joins two tables. Both tables must have distinct field names and at least one equality join predicate must be defined through join operator or using a where or filter operator.

val left = ds1.toTable(tableEnv, 'a, 'b, 'c)
val right = ds2.toTable(tableEnv, 'd, 'e, 'f)
val result = left.join(right).where('a === 'd).select('a, 'b, 'e)
Outer Joins
Batch

Similar to SQL LEFT/RIGHT/FULL OUTER JOIN clauses. Joins two tables. Both tables must have distinct field names and at least one equality join predicate must be defined.

val left = tableEnv.fromDataSet(ds1, 'a, 'b, 'c)
val right = tableEnv.fromDataSet(ds2, 'd, 'e, 'f)

val leftOuterResult = left.leftOuterJoin(right, 'a === 'd).select('a, 'b, 'e)
val rightOuterResult = left.rightOuterJoin(right, 'a === 'd).select('a, 'b, 'e)
val fullOuterResult = left.fullOuterJoin(right, 'a === 'd).select('a, 'b, 'e)
Time-windowed Join
Batch Streaming

Note: Time-windowed joins are a subset of regular joins that can be processed in a streaming fashion.

A time-windowed join requires at least one equi-join predicate and a join condition that bounds the time on both sides. Such a condition can be defined by two appropriate range predicates (<, <=, >=, >) or a single equality predicate that compares time attributes of the same type (i.e., processing time or event time) of both input tables.

For example, the following predicates are valid window join conditions:

  • 'ltime === 'rtime
  • 'ltime >= 'rtime && 'ltime < 'rtime + 10.minutes
val left = ds1.toTable(tableEnv, 'a, 'b, 'c, 'ltime.rowtime)
val right = ds2.toTable(tableEnv, 'd, 'e, 'f, 'rtime.rowtime)

val result = left.join(right)
  .where('a === 'd && 'ltime >= 'rtime - 5.minutes && 'ltime < 'rtime + 10.minutes)
  .select('a, 'b, 'e, 'ltime)
TableFunction Inner Join
Batch Streaming

Joins a table with a the results of a table function. Each row of the left (outer) table is joined with all rows produced by the corresponding call of the table function. A row of the left (outer) table is dropped, if its table function call returns an empty result.

// instantiate function
val split: TableFunction[_] = new MySplitUDTF()

// join
val result: Table = table
    .join(split('c) as ('s, 't, 'v))
    .select('a, 'b, 's, 't, 'v)
TableFunction Left Outer Join
Batch Streaming

Joins a table with a the results of a table function. Each row of the left (outer) table is joined with all rows produced by the corresponding call of the table function. If a table function call returns an empty result, the corresponding outer row is preserved and the result padded with null values.

Note: Currently, the predicate of a table function left outer join can only be empty or literal true.

// instantiate function
val split: TableFunction[_] = new MySplitUDTF()

// join
val result: Table = table
    .leftOuterJoin(split('c) as ('s, 't, 'v))
    .select('a, 'b, 's, 't, 'v)

Back to top

Set Operations

Operators Description
Union
Batch

Similar to a SQL UNION clause. Unions two tables with duplicate records removed. Both tables must have identical field types.

Table left = tableEnv.fromDataSet(ds1, "a, b, c");
Table right = tableEnv.fromDataSet(ds2, "a, b, c");
Table result = left.union(right);
UnionAll
Batch Streaming

Similar to a SQL UNION ALL clause. Unions two tables. Both tables must have identical field types.

Table left = tableEnv.fromDataSet(ds1, "a, b, c");
Table right = tableEnv.fromDataSet(ds2, "a, b, c");
Table result = left.unionAll(right);
Intersect
Batch

Similar to a SQL INTERSECT clause. Intersect returns records that exist in both tables. If a record is present one or both tables more than once, it is returned just once, i.e., the resulting table has no duplicate records. Both tables must have identical field types.

Table left = tableEnv.fromDataSet(ds1, "a, b, c");
Table right = tableEnv.fromDataSet(ds2, "d, e, f");
Table result = left.intersect(right);
IntersectAll
Batch

Similar to a SQL INTERSECT ALL clause. IntersectAll returns records that exist in both tables. If a record is present in both tables more than once, it is returned as many times as it is present in both tables, i.e., the resulting table might have duplicate records. Both tables must have identical field types.

Table left = tableEnv.fromDataSet(ds1, "a, b, c");
Table right = tableEnv.fromDataSet(ds2, "d, e, f");
Table result = left.intersectAll(right);
Minus
Batch

Similar to a SQL EXCEPT clause. Minus returns records from the left table that do not exist in the right table. Duplicate records in the left table are returned exactly once, i.e., duplicates are removed. Both tables must have identical field types.

Table left = tableEnv.fromDataSet(ds1, "a, b, c");
Table right = tableEnv.fromDataSet(ds2, "a, b, c");
Table result = left.minus(right);
MinusAll
Batch

Similar to a SQL EXCEPT ALL clause. MinusAll returns the records that do not exist in the right table. A record that is present n times in the left table and m times in the right table is returned (n - m) times, i.e., as many duplicates as are present in the right table are removed. Both tables must have identical field types.

Table left = tableEnv.fromDataSet(ds1, "a, b, c");
Table right = tableEnv.fromDataSet(ds2, "a, b, c");
Table result = left.minusAll(right);
In
Batch

Similar to a SQL IN clause. In returns true if an expression exists in a given table sub-query. The sub-query table must consist of one column. This column must have the same data type as the expression.

Table left = ds1.toTable(tableEnv, "a, b, c");
Table right = ds2.toTable(tableEnv, "a");

// using implicit registration
Table result = left.select("a, b, c").where("a.in(" + right + ")");

// using explicit registration
tableEnv.registerTable("RightTable", right);
Table result = left.select("a, b, c").where("a.in(RightTable)");
Operators Description
Union
Batch

Similar to a SQL UNION clause. Unions two tables with duplicate records removed, both tables must have identical field types.

val left = ds1.toTable(tableEnv, 'a, 'b, 'c)
val right = ds2.toTable(tableEnv, 'a, 'b, 'c)
val result = left.union(right)
UnionAll
Batch Streaming

Similar to a SQL UNION ALL clause. Unions two tables, both tables must have identical field types.

val left = ds1.toTable(tableEnv, 'a, 'b, 'c)
val right = ds2.toTable(tableEnv, 'a, 'b, 'c)
val result = left.unionAll(right)
Intersect
Batch

Similar to a SQL INTERSECT clause. Intersect returns records that exist in both tables. If a record is present in one or both tables more than once, it is returned just once, i.e., the resulting table has no duplicate records. Both tables must have identical field types.

val left = ds1.toTable(tableEnv, 'a, 'b, 'c)
val right = ds2.toTable(tableEnv, 'e, 'f, 'g)
val result = left.intersect(right)
IntersectAll
Batch

Similar to a SQL INTERSECT ALL clause. IntersectAll returns records that exist in both tables. If a record is present in both tables more than once, it is returned as many times as it is present in both tables, i.e., the resulting table might have duplicate records. Both tables must have identical field types.

val left = ds1.toTable(tableEnv, 'a, 'b, 'c)
val right = ds2.toTable(tableEnv, 'e, 'f, 'g)
val result = left.intersectAll(right)
Minus
Batch

Similar to a SQL EXCEPT clause. Minus returns records from the left table that do not exist in the right table. Duplicate records in the left table are returned exactly once, i.e., duplicates are removed. Both tables must have identical field types.

val left = ds1.toTable(tableEnv, 'a, 'b, 'c)
val right = ds2.toTable(tableEnv, 'a, 'b, 'c)
val result = left.minus(right)
MinusAll
Batch

Similar to a SQL EXCEPT ALL clause. MinusAll returns the records that do not exist in the right table. A record that is present n times in the left table and m times in the right table is returned (n - m) times, i.e., as many duplicates as are present in the right table are removed. Both tables must have identical field types.

val left = ds1.toTable(tableEnv, 'a, 'b, 'c)
val right = ds2.toTable(tableEnv, 'a, 'b, 'c)
val result = left.minusAll(right)
In
Batch

Similar to a SQL IN clause. In returns true if an expression exists in a given table sub-query. The sub-query table must consist of one column. This column must have the same data type as the expression.

val left = ds1.toTable(tableEnv, 'a, 'b, 'c)
val right = ds2.toTable(tableEnv, 'a)
val result = left.select('a, 'b, 'c).where('a.in(right))

Back to top

OrderBy, Offset & Fetch

Operators Description
Order By
Batch

Similar to a SQL ORDER BY clause. Returns records globally sorted across all parallel partitions.

Table in = tableEnv.fromDataSet(ds, "a, b, c");
Table result = in.orderBy("a.asc");
Offset & Fetch
Batch

Similar to the SQL OFFSET and FETCH clauses. Offset and Fetch limit the number of records returned from a sorted result. Offset and Fetch are technically part of the Order By operator and thus must be preceded by it.

Table in = tableEnv.fromDataSet(ds, "a, b, c");

// returns the first 5 records from the sorted result
Table result1 = in.orderBy("a.asc").fetch(5); 

// skips the first 3 records and returns all following records from the sorted result
Table result2 = in.orderBy("a.asc").offset(3);

// skips the first 10 records and returns the next 5 records from the sorted result
Table result3 = in.orderBy("a.asc").offset(10).fetch(5);
Operators Description
Order By
Batch

Similar to a SQL ORDER BY clause. Returns records globally sorted across all parallel partitions.

val in = ds.toTable(tableEnv, 'a, 'b, 'c)
val result = in.orderBy('a.asc)
Offset & Fetch
Batch

Similar to the SQL OFFSET and FETCH clauses. Offset and Fetch limit the number of records returned from a sorted result. Offset and Fetch are technically part of the Order By operator and thus must be preceded by it.

val in = ds.toTable(tableEnv, 'a, 'b, 'c)

// returns the first 5 records from the sorted result
val result1: Table = in.orderBy('a.asc).fetch(5)

// skips the first 3 records and returns all following records from the sorted result
val result2: Table = in.orderBy('a.asc).offset(3)

// skips the first 10 records and returns the next 5 records from the sorted result
val result3: Table = in.orderBy('a.asc).offset(10).fetch(5)

Insert

Operators Description
Insert Into
Batch Streaming

Similar to the INSERT INTO clause in a SQL query. Performs a insertion into a registered output table.

Output tables must be registered in the TableEnvironment (see Register a TableSink). Moreover, the schema of the registered table must match the schema of the query.

Table orders = tableEnv.scan("Orders");
orders.insertInto("OutOrders");
Operators Description
Insert Into
Batch Streaming

Similar to the INSERT INTO clause in a SQL query. Performs a insertion into a registered output table.

Output tables must be registered in the TableEnvironment (see Register a TableSink). Moreover, the schema of the registered table must match the schema of the query.

val orders: Table = tableEnv.scan("Orders")
orders.insertInto("OutOrders")

Back to top

Group Windows

Group window aggregates group rows into finite groups based on time or row-count intervals and evaluate aggregation functions once per group. For batch tables, windows are a convenient shortcut to group records by time intervals.

Windows are defined using the window(w: Window) clause and require an alias, which is specified using the as clause. In order to group a table by a window, the window alias must be referenced in the groupBy(...) clause like a regular grouping attribute. The following example shows how to define a window aggregation on a table.

Table table = input
  .window([Window w].as("w"))  // define window with alias w
  .groupBy("w")  // group the table by window w
  .select("b.sum");  // aggregate
val table = input
  .window([w: Window] as 'w)  // define window with alias w
  .groupBy('w)   // group the table by window w
  .select('b.sum)  // aggregate

In streaming environments, window aggregates can only be computed in parallel if they group on one or more attributes in addition to the window, i.e., the groupBy(...) clause references a window alias and at least one additional attribute. A groupBy(...) clause that only references a window alias (such as in the example above) can only be evaluated by a single, non-parallel task. The following example shows how to define a window aggregation with additional grouping attributes.

Table table = input
  .window([Window w].as("w"))  // define window with alias w
  .groupBy("w, a")  // group the table by attribute a and window w 
  .select("a, b.sum");  // aggregate
val table = input
  .window([w: Window] as 'w) // define window with alias w
  .groupBy('w, 'a)  // group the table by attribute a and window w 
  .select('a, 'b.sum)  // aggregate

Window properties such as the start, end, or rowtime timestamp of a time window can be added in the select statement as a property of the window alias as w.start, w.end, and w.rowtime, respectively. The window start and rowtime timestamps are the inclusive lower and uppper window boundaries. In contrast, the window end timestamp is the exclusive upper window boundary. For example a tumbling window of 30 minutes that starts at 2pm would have 14:00:00.000 as start timestamp, 14:29:59.999 as rowtime timestamp, and 14:30:00.000 as end timestamp.

Table table = input
  .window([Window w].as("w"))  // define window with alias w
  .groupBy("w, a")  // group the table by attribute a and window w 
  .select("a, w.start, w.end, w.rowtime, b.count"); // aggregate and add window start, end, and rowtime timestamps
val table = input
  .window([w: Window] as 'w)  // define window with alias w
  .groupBy('w, 'a)  // group the table by attribute a and window w 
  .select('a, 'w.start, 'w.end, 'w.rowtime, 'b.count) // aggregate and add window start, end, and rowtime timestamps

The Window parameter defines how rows are mapped to windows. Window is not an interface that users can implement. Instead, the Table API provides a set of predefined Window classes with specific semantics, which are translated into underlying DataStream or DataSet operations. The supported window definitions are listed below.

Tumble (Tumbling Windows)

A tumbling window assigns rows to non-overlapping, continuous windows of fixed length. For example, a tumbling window of 5 minutes groups rows in 5 minutes intervals. Tumbling windows can be defined on event-time, processing-time, or on a row-count.

Tumbling windows are defined by using the Tumble class as follows:

Method Description
over Defines the length the window, either as time or row-count interval.
on The time attribute to group (time interval) or sort (row count) on. For batch queries this might be any Long or Timestamp attribute. For streaming queries this must be a declared event-time or processing-time time attribute.
as Assigns an alias to the window. The alias is used to reference the window in the following groupBy() clause and optionally to select window properties such as window start, end, or rowtime timestamps in the select() clause.
// Tumbling Event-time Window
.window(Tumble.over("10.minutes").on("rowtime").as("w"));

// Tumbling Processing-time Window (assuming a processing-time attribute "proctime")
.window(Tumble.over("10.minutes").on("proctime").as("w"));

// Tumbling Row-count Window (assuming a processing-time attribute "proctime")
.window(Tumble.over("10.rows").on("proctime").as("w"));
// Tumbling Event-time Window
.window(Tumble over 10.minutes on 'rowtime as 'w)

// Tumbling Processing-time Window (assuming a processing-time attribute "proctime")
.window(Tumble over 10.minutes on 'proctime as 'w)

// Tumbling Row-count Window (assuming a processing-time attribute "proctime")
.window(Tumble over 10.rows on 'proctime as 'w)

Slide (Sliding Windows)

A sliding window has a fixed size and slides by a specified slide interval. If the slide interval is smaller than the window size, sliding windows are overlapping. Thus, rows can be assigned to multiple windows. For example, a sliding window of 15 minutes size and 5 minute slide interval assigns each row to 3 different windows of 15 minute size, which are evaluated in an interval of 5 minutes. Sliding windows can be defined on event-time, processing-time, or on a row-count.

Sliding windows are defined by using the Slide class as follows:

Method Description
over Defines the length of the window, either as time or row-count interval.
every Defines the slide interval, either as time or row-count interval. The slide interval must be of the same type as the size interval.
on The time attribute to group (time interval) or sort (row count) on. For batch queries this might be any Long or Timestamp attribute. For streaming queries this must be a declared event-time or processing-time time attribute.
as Assigns an alias to the window. The alias is used to reference the window in the following groupBy() clause and optionally to select window properties such as window start, end, or rowtime timestamps in the select() clause.
// Sliding Event-time Window
.window(Slide.over("10.minutes").every("5.minutes").on("rowtime").as("w"));

// Sliding Processing-time window (assuming a processing-time attribute "proctime")
.window(Slide.over("10.minutes").every("5.minutes").on("proctime").as("w"));

// Sliding Row-count window (assuming a processing-time attribute "proctime")
.window(Slide.over("10.rows").every("5.rows").on("proctime").as("w"));
// Sliding Event-time Window
.window(Slide over 10.minutes every 5.minutes on 'rowtime as 'w)

// Sliding Processing-time window (assuming a processing-time attribute "proctime")
.window(Slide over 10.minutes every 5.minutes on 'proctime as 'w)

// Sliding Row-count window (assuming a processing-time attribute "proctime")
.window(Slide over 10.rows every 5.rows on 'proctime as 'w)

Session (Session Windows)

Session windows do not have a fixed size but their bounds are defined by an interval of inactivity, i.e., a session window is closes if no event appears for a defined gap period. For example a session window with a 30 minute gap starts when a row is observed after 30 minutes inactivity (otherwise the row would be added to an existing window) and is closed if no row is added within 30 minutes. Session windows can work on event-time or processing-time.

A session window is defined by using the Session class as follows:

Method Description
withGap Defines the gap between two windows as time interval.
on The time attribute to group (time interval) or sort (row count) on. For batch queries this might be any Long or Timestamp attribute. For streaming queries this must be a declared event-time or processing-time time attribute.
as Assigns an alias to the window. The alias is used to reference the window in the following groupBy() clause and optionally to select window properties such as window start, end, or rowtime timestamps in the select() clause.
// Session Event-time Window
.window(Session.withGap("10.minutes").on("rowtime").as("w"));

// Session Processing-time Window (assuming a processing-time attribute "proctime")
.window(Session.withGap("10.minutes").on("proctime").as("w"));
// Session Event-time Window
.window(Session withGap 10.minutes on 'rowtime as 'w)

// Session Processing-time Window (assuming a processing-time attribute "proctime")
.window(Session withGap 10.minutes on 'proctime as 'w)

Back to top

Over Windows

Over window aggregates are known from standard SQL (OVER clause) and defined in the SELECT clause of a query. Unlike group windows, which are specified in the GROUP BY clause, over windows do not collapse rows. Instead over window aggregates compute an aggregate for each input row over a range of its neighboring rows.

Over windows are defined using the window(w: OverWindow*) clause and referenced via an alias in the select() method. The following example shows how to define an over window aggregation on a table.

Table table = input
  .window([OverWindow w].as("w"))           // define over window with alias w
  .select("a, b.sum over w, c.min over w"); // aggregate over the over window w
val table = input
  .window([w: OverWindow] as 'w)              // define over window with alias w
  .select('a, 'b.sum over 'w, 'c.min over 'w) // aggregate over the over window w

The OverWindow defines a range of rows over which aggregates are computed. OverWindow is not an interface that users can implement. Instead, the Table API provides the Over class to configure the properties of the over window. Over windows can be defined on event-time or processing-time and on ranges specified as time interval or row-count. The supported over window definitions are exposed as methods on Over (and other classes) and are listed below:

Method Required Description
partitionBy Optional

Defines a partitioning of the input on one or more attributes. Each partition is individually sorted and aggregate functions are applied to each partition separately.

Note: In streaming environments, over window aggregates can only be computed in parallel if the window includes a partition by clause. Without partitionBy(...) the stream is processed by a single, non-parallel task.

orderBy Required

Defines the order of rows within each partition and thereby the order in which the aggregate functions are applied to rows.

Note: For streaming queries this must be a declared event-time or processing-time time attribute. Currently, only a single sort attribute is supported.

preceding Required

Defines the interval of rows that are included in the window and precede the current row. The interval can either be specified as time or row-count interval.

Bounded over windows are specified with the size of the interval, e.g., 10.minutes for a time interval or 10.rows for a row-count interval.

Unbounded over windows are specified using a constant, i.e., UNBOUNDED_RANGE for a time interval or UNBOUNDED_ROW for a row-count interval. Unbounded over windows start with the first row of a partition.

following Optional

Defines the window interval of rows that are included in the window and follow the current row. The interval must be specified in the same unit as the preceding interval (time or row-count).

At the moment, over windows with rows following the current row are not supported. Instead you can specify one of two constants:

  • CURRENT_ROW sets the upper bound of the window to the current row.
  • CURRENT_RANGE sets the upper bound of the window to sort key of the current row, i.e., all rows with the same sort key as the current row are included in the window.

If the following clause is omitted, the upper bound of a time interval window is defined as CURRENT_RANGE and the upper bound of a row-count interval window is defined as CURRENT_ROW.

as Required

Assigns an alias to the over window. The alias is used to reference the over window in the following select() clause.

Note: Currently, all aggregation functions in the same select() call must be computed of the same over window.

Unbounded Over Windows

// Unbounded Event-time over window (assuming an event-time attribute "rowtime")
.window(Over.partitionBy("a").orderBy("rowtime").preceding("unbounded_range").as("w"));

// Unbounded Processing-time over window (assuming a processing-time attribute "proctime")
.window(Over.partitionBy("a").orderBy("proctime").preceding("unbounded_range").as("w"));

// Unbounded Event-time Row-count over window (assuming an event-time attribute "rowtime")
.window(Over.partitionBy("a").orderBy("rowtime").preceding("unbounded_row").as("w"));
 
// Unbounded Processing-time Row-count over window (assuming a processing-time attribute "proctime")
.window(Over.partitionBy("a").orderBy("proctime").preceding("unbounded_row").as("w"));
// Unbounded Event-time over window (assuming an event-time attribute "rowtime")
.window(Over partitionBy 'a orderBy 'rowtime preceding UNBOUNDED_RANGE as 'w)

// Unbounded Processing-time over window (assuming a processing-time attribute "proctime")
.window(Over partitionBy 'a orderBy 'proctime preceding UNBOUNDED_RANGE as 'w)

// Unbounded Event-time Row-count over window (assuming an event-time attribute "rowtime")
.window(Over partitionBy 'a orderBy 'rowtime preceding UNBOUNDED_ROW as 'w)
 
// Unbounded Processing-time Row-count over window (assuming a processing-time attribute "proctime")
.window(Over partitionBy 'a orderBy 'proctime preceding UNBOUNDED_ROW as 'w)

Bounded Over Windows

// Bounded Event-time over window (assuming an event-time attribute "rowtime")
.window(Over.partitionBy("a").orderBy("rowtime").preceding("1.minutes").as("w"))

// Bounded Processing-time over window (assuming a processing-time attribute "proctime")
.window(Over.partitionBy("a").orderBy("proctime").preceding("1.minutes").as("w"))

// Bounded Event-time Row-count over window (assuming an event-time attribute "rowtime")
.window(Over.partitionBy("a").orderBy("rowtime").preceding("10.rows").as("w"))
 
// Bounded Processing-time Row-count over window (assuming a processing-time attribute "proctime")
.window(Over.partitionBy("a").orderBy("proctime").preceding("10.rows").as("w"))
// Bounded Event-time over window (assuming an event-time attribute "rowtime")
.window(Over partitionBy 'a orderBy 'rowtime preceding 1.minutes as 'w)

// Bounded Processing-time over window (assuming a processing-time attribute "proctime")
.window(Over partitionBy 'a orderBy 'proctime preceding 1.minutes as 'w)

// Bounded Event-time Row-count over window (assuming an event-time attribute "rowtime")
.window(Over partitionBy 'a orderBy 'rowtime preceding 10.rows as 'w)
  
// Bounded Processing-time Row-count over window (assuming a processing-time attribute "proctime")
.window(Over partitionBy 'a orderBy 'proctime preceding 10.rows as 'w)

Back to top

Data Types

The Table API is built on top of Flinkā€™s DataSet and DataStream APIs. Internally, it also uses Flinkā€™s TypeInformation to define data types. Fully supported types are listed in org.apache.flink.table.api.Types. The following table summarizes the relation between Table API types, SQL types, and the resulting Java class.

Table API SQL Java type
Types.STRING VARCHAR java.lang.String
Types.BOOLEAN BOOLEAN java.lang.Boolean
Types.BYTE TINYINT java.lang.Byte
Types.SHORT SMALLINT java.lang.Short
Types.INT INTEGER, INT java.lang.Integer
Types.LONG BIGINT java.lang.Long
Types.FLOAT REAL, FLOAT java.lang.Float
Types.DOUBLE DOUBLE java.lang.Double
Types.DECIMAL DECIMAL java.math.BigDecimal
Types.SQL_DATE DATE java.sql.Date
Types.SQL_TIME TIME java.sql.Time
Types.SQL_TIMESTAMP TIMESTAMP(3) java.sql.Timestamp
Types.INTERVAL_MONTHS INTERVAL YEAR TO MONTH java.lang.Integer
Types.INTERVAL_MILLIS INTERVAL DAY TO SECOND(3) java.lang.Long
Types.PRIMITIVE_ARRAY ARRAY e.g. int[]
Types.OBJECT_ARRAY ARRAY e.g. java.lang.Byte[]
Types.MAP MAP java.util.HashMap
Types.MULTISET MULTISET e.g. java.util.HashMap<String, Integer> for a multiset of String

Generic types and composite types (e.g., POJOs or Tuples) can be fields of a row as well. Generic types are treated as a black box and can be passed on or processed by user-defined functions. Composite types can be accessed with built-in functions (see Value access functions section).

Back to top

Expression Syntax

Some of the operators in previous sections expect one or more expressions. Expressions can be specified using an embedded Scala DSL or as Strings. Please refer to the examples above to learn how expressions can be specified.

This is the EBNF grammar for expressions:

expressionList = expression , { "," , expression } ;

expression = timeIndicator | overConstant | alias ;

alias = logic | ( logic , "as" , fieldReference ) | ( logic , "as" , "(" , fieldReference , { "," , fieldReference } , ")" ) ;

logic = comparison , [ ( "&&" | "||" ) , comparison ] ;

comparison = term , [ ( "=" | "==" | "===" | "!=" | "!==" | ">" | ">=" | "<" | "<=" ) , term ] ;

term = product , [ ( "+" | "-" ) , product ] ;

product = unary , [ ( "*" | "/" | "%") , unary ] ;

unary = [ "!" | "-" ] , composite ;

composite = over | nullLiteral | suffixed | atom ;

suffixed = interval | cast | as | if | functionCall ;

interval = timeInterval | rowInterval ;

timeInterval = composite , "." , ("year" | "years" | "month" | "months" | "day" | "days" | "hour" | "hours" | "minute" | "minutes" | "second" | "seconds" | "milli" | "millis") ;

rowInterval = composite , "." , "rows" ;

cast = composite , ".cast(" , dataType , ")" ;

dataType = "BYTE" | "SHORT" | "INT" | "LONG" | "FLOAT" | "DOUBLE" | "BOOLEAN" | "STRING" | "DECIMAL" | "SQL_DATE" | "SQL_TIME" | "SQL_TIMESTAMP" | "INTERVAL_MONTHS" | "INTERVAL_MILLIS" | ( "MAP" , "(" , dataType , "," , dataType , ")" ) | ( "PRIMITIVE_ARRAY" , "(" , dataType , ")" ) | ( "OBJECT_ARRAY" , "(" , dataType , ")" ) ;

as = composite , ".as(" , fieldReference , ")" ;

if = composite , ".?(" , expression , "," , expression , ")" ;

functionCall = composite , "." , functionIdentifier , [ "(" , [ expression , { "," , expression } ] , ")" ] ;

atom = ( "(" , expression , ")" ) | literal | fieldReference ;

fieldReference = "*" | identifier ;

nullLiteral = "Null(" , dataType , ")" ;

timeIntervalUnit = "YEAR" | "YEAR_TO_MONTH" | "MONTH" | "DAY" | "DAY_TO_HOUR" | "DAY_TO_MINUTE" | "DAY_TO_SECOND" | "HOUR" | "HOUR_TO_MINUTE" | "HOUR_TO_SECOND" | "MINUTE" | "MINUTE_TO_SECOND" | "SECOND" ;

timePointUnit = "YEAR" | "MONTH" | "DAY" | "HOUR" | "MINUTE" | "SECOND" | "QUARTER" | "WEEK" | "MILLISECOND" | "MICROSECOND" ;

over = composite , "over" , fieldReference ;

overConstant = "current_row" | "current_range" | "unbounded_row" | "unbounded_row" ;

timeIndicator = fieldReference , "." , ( "proctime" | "rowtime" ) ;

Here, literal is a valid Java literal, fieldReference specifies a column in the data (or all columns if * is used), and functionIdentifier specifies a supported scalar function. The column names and function names follow Java identifier syntax. Expressions specified as Strings can also use prefix notation instead of suffix notation to call operators and functions.

If working with exact numeric values or large decimals is required, the Table API also supports Javaā€™s BigDecimal type. In the Scala Table API decimals can be defined by BigDecimal("123456") and in Java by appending a ā€œpā€ for precise e.g. 123456p.

In order to work with temporal values the Table API supports Java SQLā€™s Date, Time, and Timestamp types. In the Scala Table API literals can be defined by using java.sql.Date.valueOf("2016-06-27"), java.sql.Time.valueOf("10:10:42"), or java.sql.Timestamp.valueOf("2016-06-27 10:10:42.123"). The Java and Scala Table API also support calling "2016-06-27".toDate(), "10:10:42".toTime(), and "2016-06-27 10:10:42.123".toTimestamp() for converting Strings into temporal types. Note: Since Javaā€™s temporal SQL types are time zone dependent, please make sure that the Flink Client and all TaskManagers use the same time zone.

Temporal intervals can be represented as number of months (Types.INTERVAL_MONTHS) or number of milliseconds (Types.INTERVAL_MILLIS). Intervals of same type can be added or subtracted (e.g. 1.hour + 10.minutes). Intervals of milliseconds can be added to time points (e.g. "2016-08-10".toDate + 5.days).

Back to top

Built-In Functions

The Table API comes with a set of built-in functions for data transformations. This section gives a brief overview of the available functions.

Comparison functions Description
ANY === ANY

Equals.

ANY !== ANY

Not equal.

ANY > ANY

Greater than.

ANY >= ANY

Greater than or equal.

ANY < ANY

Less than.

ANY <= ANY

Less than or equal.

ANY.isNull

Returns true if the given expression is null.

ANY.isNotNull

Returns true if the given expression is not null.

STRING.like(STRING)

Returns true, if a string matches the specified LIKE pattern. E.g. "Jo_n%" matches all strings that start with "Jo(arbitrary letter)n".

STRING.similar(STRING)

Returns true, if a string matches the specified SQL regex pattern. E.g. "A+" matches all strings that consist of at least one "A".

ANY.in(ANY, ANY, ...)

Returns true if an expression exists in a given list of expressions. This is a shorthand for multiple OR conditions. If the testing set contains null, the result will be null if the element can not be found and true if it can be found. If element is null, the result is always null. E.g. "42.in(1, 2, 3)" leads to false.

ANY.in(TABLE)

Returns true if an expression exists in a given table sub-query. The sub-query table must consist of one column. This column must have the same data type as the expression. Note: This operation is not supported in a streaming environment yet.

Logical functions Description
boolean1 || boolean2

Returns true if boolean1 is true or boolean2 is true. Supports three-valued logic.

boolean1 && boolean2

Returns true if boolean1 and boolean2 are both true. Supports three-valued logic.

!BOOLEAN

Returns true if boolean expression is not true; returns null if boolean is null.

BOOLEAN.isTrue

Returns true if the given boolean expression is true. False otherwise (for null and false).

BOOLEAN.isFalse

Returns true if given boolean expression is false. False otherwise (for null and true).

BOOLEAN.isNotTrue

Returns true if the given boolean expression is not true (for null and false). False otherwise.

BOOLEAN.isNotFalse

Returns true if given boolean expression is not false (for null and true). False otherwise.

Arithmetic functions Description
+ numeric

Returns numeric.

- numeric

Returns negative numeric.

numeric1 + numeric2

Returns numeric1 plus numeric2.

numeric1 - numeric2

Returns numeric1 minus numeric2.

numeric1 * numeric2

Returns numeric1 multiplied by numeric2.

numeric1 / numeric2

Returns numeric1 divided by numeric2.

numeric1.power(numeric2)

Returns numeric1 raised to the power of numeric2.

NUMERIC.abs()

Calculates the absolute value of given value.

numeric1 % numeric2

Returns the remainder (modulus) of numeric1 divided by numeric2. The result is negative only if numeric1 is negative.

NUMERIC.sqrt()

Calculates the square root of a given value.

NUMERIC.ln()

Calculates the natural logarithm of given value.

NUMERIC.log10()

Calculates the base 10 logarithm of given value.

NUMERIC.exp()

Calculates the Euler's number raised to the given power.

NUMERIC.ceil()

Calculates the smallest integer greater than or equal to a given number.

NUMERIC.floor()

Calculates the largest integer less than or equal to a given number.

NUMERIC.sin()

Calculates the sine of a given number.

NUMERIC.cos()

Calculates the cosine of a given number.

NUMERIC.tan()

Calculates the tangent of a given number.

NUMERIC.cot()

Calculates the cotangent of a given number.

NUMERIC.asin()

Calculates the arc sine of a given number.

NUMERIC.acos()

Calculates the arc cosine of a given number.

NUMERIC.atan()

Calculates the arc tangent of a given number.

NUMERIC.degrees()

Converts numeric from radians to degrees.

NUMERIC.radians()

Converts numeric from degrees to radians.

NUMERIC.sign()

Calculates the signum of a given number.

NUMERIC.round(INT)

Rounds the given number to integer places right to the decimal point.

pi()

Returns a value that is closer than any other value to pi.

e()

Returns a value that is closer than any other value to e.

rand()

Returns a pseudorandom double value between 0.0 (inclusive) and 1.0 (exclusive).

rand(seed integer)

Returns a pseudorandom double value between 0.0 (inclusive) and 1.0 (exclusive) with a initial seed. Two rand functions will return identical sequences of numbers if they have same initial seed.

randInteger(bound integer)

Returns a pseudorandom integer value between 0.0 (inclusive) and the specified value (exclusive).

randInteger(seed integer, bound integer)

Returns a pseudorandom integer value between 0.0 (inclusive) and the specified value (exclusive) with a initial seed. Two randInteger functions will return identical sequences of numbers if they have same initial seed and same bound.

NUMERIC.bin()

Returns a string representation of an integer numeric value in binary format. Returns null if numeric is null. E.g. "4" leads to "100", "12" leads to "1100".

String functions Description
STRING + STRING

Concatenates two character strings.

STRING.charLength()

Returns the length of a String.

STRING.upperCase()

Returns all of the characters in a string in upper case using the rules of the default locale.

STRING.lowerCase()

Returns all of the characters in a string in lower case using the rules of the default locale.

STRING.position(STRING)

Returns the position of string in an other string starting at 1. Returns 0 if string could not be found. E.g. 'a'.position('bbbbba') leads to 6.

STRING.trim(LEADING, STRING)
STRING.trim(TRAILING, STRING)
STRING.trim(BOTH, STRING)
STRING.trim(BOTH)
STRING.trim()

Removes leading and/or trailing characters from the given string. By default, whitespaces at both sides are removed.

STRING.overlay(STRING, INT)
STRING.overlay(STRING, INT, INT)

Replaces a substring of string with a string starting at a position (starting at 1). An optional length specifies how many characters should be removed. E.g. 'xxxxxtest'.overlay('xxxx', 6) leads to "xxxxxxxxx", 'xxxxxtest'.overlay('xxxx', 6, 2) leads to "xxxxxxxxxst".

STRING.substring(INT)

Creates a substring of the given string beginning at the given index to the end. The start index starts at 1 and is inclusive.

STRING.substring(INT, INT)

Creates a substring of the given string at the given index for the given length. The index starts at 1 and is inclusive, i.e., the character at the index is included in the substring. The substring has the specified length or less.

STRING.initCap()

Converts the initial letter of each word in a string to uppercase. Assumes a string containing only [A-Za-z0-9], everything else is treated as whitespace.

STRING.lpad(len INT, pad STRING)

Returns a string left-padded with the given pad string to a length of len characters. If the string is longer than len, the return value is shortened to len characters. E.g. "hi".lpad(4, '??') returns "??hi", "hi".lpad(1, '??') returns "h".

STRING.rpad(len INT, pad STRING)

Returns a string right-padded with the given pad string to a length of len characters. If the string is longer than len, the return value is shortened to len characters. E.g. "hi".rpad(4, '??') returns "hi??", "hi".rpad(1, '??') returns "h".

concat(string1, string2,...)

Returns the string that results from concatenating the arguments. Returns NULL if any argument is NULL. E.g. concat("AA", "BB", "CC") returns AABBCC.

concat_ws(separator, string1, string2,...)

Returns the string that results from concatenating the arguments using a separator. The separator is added between the strings to be concatenated. Returns NULL If the separator is NULL. concat_ws() does not skip empty strings. However, it does skip any NULL argument. E.g. concat_ws("~", "AA", "BB", "", "CC") returns AA~BB~~CC

Conditional functions Description
BOOLEAN.?(value1, value2)

Ternary conditional operator that decides which of two other expressions should be evaluated based on a evaluated boolean condition. E.g. (42 > 5).?("A", "B") leads to "A".

Type conversion functions Description
ANY.cast(TYPE)

Converts a value to a given type. E.g. "42".cast(INT) leads to 42.

Value constructor functions Description
NUMERIC.rows

Creates an interval of rows.

Temporal functions Description
STRING.toDate()

Parses a date string in the form "yy-mm-dd" to a SQL date.

STRING.toTime()

Parses a time string in the form "hh:mm:ss" to a SQL time.

STRING.toTimestamp()

Parses a timestamp string in the form "yy-mm-dd hh:mm:ss.fff" to a SQL timestamp.

NUMERIC.year
NUMERIC.years

Creates an interval of months for a given number of years.

NUMERIC.month
NUMERIC.months

Creates an interval of months for a given number of months.

NUMERIC.day
NUMERIC.days

Creates an interval of milliseconds for a given number of days.

NUMERIC.hour
NUMERIC.hours

Creates an interval of milliseconds for a given number of hours.

NUMERIC.minute
NUMERIC.minutes

Creates an interval of milliseconds for a given number of minutes.

NUMERIC.second
NUMERIC.seconds

Creates an interval of milliseconds for a given number of seconds.

NUMERIC.milli
NUMERIC.millis

Creates an interval of milliseconds.

currentDate()

Returns the current SQL date in UTC time zone.

currentTime()

Returns the current SQL time in UTC time zone.

currentTimestamp()

Returns the current SQL timestamp in UTC time zone.

localTime()

Returns the current SQL time in local time zone.

localTimestamp()

Returns the current SQL timestamp in local time zone.

TEMPORAL.extract(TIMEINTERVALUNIT)

Extracts parts of a time point or time interval. Returns the part as a long value. E.g. '2006-06-05'.toDate.extract(DAY) leads to 5.

TIMEPOINT.floor(TIMEINTERVALUNIT)

Rounds a time point down to the given unit. E.g. '12:44:31'.toDate.floor(MINUTE) leads to 12:44:00.

TIMEPOINT.ceil(TIMEINTERVALUNIT)

Rounds a time point up to the given unit. E.g. '12:44:31'.toTime.floor(MINUTE) leads to 12:45:00.

DATE.quarter()

Returns the quarter of a year from a SQL date. E.g. '1994-09-27'.toDate.quarter() leads to 3.

temporalOverlaps(TIMEPOINT, TEMPORAL, TIMEPOINT, TEMPORAL)

Determines whether two anchored time intervals overlap. Time point and temporal are transformed into a range defined by two time points (start, end). The function evaluates leftEnd >= rightStart && rightEnd >= leftStart. E.g. temporalOverlaps("2:55:00".toTime, 1.hour, "3:30:00".toTime, 2.hour) leads to true.

dateFormat(TIMESTAMP, STRING)

Formats timestamp as a string using a specified format. The format must be compatible with MySQL's date formatting syntax as used by the date_parse function. The format specification is given in the Date Format Specifier table below.

For example dateFormat(ts, '%Y, %d %M') results in strings formatted as "2017, 05 May".

Aggregate functions Description
FIELD.count

Returns the number of input rows for which the field is not null.

FIELD.avg

Returns the average (arithmetic mean) of the numeric field across all input values.

FIELD.sum

Returns the sum of the numeric field across all input values. If all values are null, null is returned.

FIELD.sum0

Returns the sum of the numeric field across all input values. If all values are null, 0 is returned.

FIELD.max

Returns the maximum value of field across all input values.

FIELD.min

Returns the minimum value of field across all input values.

FIELD.stddevPop

Returns the population standard deviation of the numeric field across all input values.

FIELD.stddevSamp

Returns the sample standard deviation of the numeric field across all input values.

FIELD.varPop

Returns the population variance (square of the population standard deviation) of the numeric field across all input values.

FIELD.varSamp

Returns the sample variance (square of the sample standard deviation) of the numeric field across all input values.

FIELD.collect
        

Returns the multiset aggregate of the input value.

Value access functions Description
COMPOSITE.get(STRING)
COMPOSITE.get(INT)

Accesses the field of a Flink composite type (such as Tuple, POJO, etc.) by index or name and returns it's value. E.g. pojo.get('myField') or tuple.get(0).

ANY.flatten()

Converts a Flink composite type (such as Tuple, POJO, etc.) and all of its direct subtypes into a flat representation where every subtype is a separate field. In most cases the fields of the flat representation are named similarly to the original fields but with a dollar separator (e.g. mypojo$mytuple$f0).

Array functions Description
array(ANY [, ANY ]*)

Creates an array from a list of values. The array will be an array of objects (not primitives).

ARRAY.cardinality()

Returns the number of elements of an array.

ARRAY.at(INT)

Returns the element at a particular position in an array. The index starts at 1.

ARRAY.element()

Returns the sole element of an array with a single element. Returns null if the array is empty. Throws an exception if the array has more than one element.

Map functions Description
map(ANY, ANY [, ANY, ANY ]*)

Creates a map from a list of key-value pairs.

MAP.cardinality()

Returns the number of entries of a map.

MAP.at(ANY)

Returns the value specified by a particular key in a map.

Hash functions Description
STRING.md5()

Returns the MD5 hash of the string argument as a string of 32 hexadecimal digits; null if string is null.

STRING.sha1()

Returns the SHA-1 hash of the string argument as a string of 40 hexadecimal digits; null if string is null.

STRING.sha256()

Returns the SHA-256 hash of the string argument as a string of 64 hexadecimal digits; null if string is null.

Row functions Description
row(ANY, [, ANY]*)

Creates a row from a list of values. Row is composite type and can be access via value access functions.

Auxiliary functions Description
ANY.as(name [, name ]* )

Specifies a name for an expression i.e. a field. Additional names can be specified if the expression expands to multiple fields.

Comparison functions Description
ANY === ANY

Equals.

ANY !== ANY

Not equal.

ANY > ANY

Greater than.

ANY >= ANY

Greater than or equal.

ANY < ANY

Less than.

ANY <= ANY

Less than or equal.

ANY.isNull

Returns true if the given expression is null.

ANY.isNotNull

Returns true if the given expression is not null.

STRING.like(STRING)

Returns true, if a string matches the specified LIKE pattern. E.g. "Jo_n%" matches all strings that start with "Jo(arbitrary letter)n".

STRING.similar(STRING)

Returns true, if a string matches the specified SQL regex pattern. E.g. "A+" matches all strings that consist of at least one "A".

ANY.in(ANY, ANY, ...)

Returns true if an expression exists in a given list of expressions. This is a shorthand for multiple OR conditions. If the testing set contains null, the result will be null if the element can not be found and true if it can be found. If element is null, the result is always null. E.g. "42".in(1, 2, 3) leads to false.

ANY.in(TABLE)

Returns true if an expression exists in a given table sub-query. The sub-query table must consist of one column. This column must have the same data type as the expression. Note: This operation is not supported in a streaming environment yet.

Logical functions Description
boolean1 || boolean2

Returns true if boolean1 is true or boolean2 is true. Supports three-valued logic.

boolean1 && boolean2

Returns true if boolean1 and boolean2 are both true. Supports three-valued logic.

!BOOLEAN

Returns true if boolean expression is not true; returns null if boolean is null.

BOOLEAN.isTrue

Returns true if the given boolean expression is true. False otherwise (for null and false).

BOOLEAN.isFalse

Returns true if given boolean expression is false. False otherwise (for null and true).

BOOLEAN.isNotTrue

Returns true if the given boolean expression is not true (for null and false). False otherwise.

BOOLEAN.isNotFalse

Returns true if given boolean expression is not false (for null and true). False otherwise.

Arithmetic functions Description
+ numeric

Returns numeric.

- numeric

Returns negative numeric.

numeric1 + numeric2

Returns numeric1 plus numeric2.

numeric1 - numeric2

Returns numeric1 minus numeric2.

numeric1 * numeric2

Returns numeric1 multiplied by numeric2.

numeric1 / numeric2

Returns numeric1 divided by numeric2.

numeric1.power(numeric2)

Returns numeric1 raised to the power of numeric2.

NUMERIC.abs()

Calculates the absolute value of given value.

numeric1 % numeric2

Returns the remainder (modulus) of numeric1 divided by numeric2. The result is negative only if numeric1 is negative.

NUMERIC.sqrt()

Calculates the square root of a given value.

NUMERIC.ln()

Calculates the natural logarithm of given value.

NUMERIC.log10()

Calculates the base 10 logarithm of given value.

NUMERIC.exp()

Calculates the Euler's number raised to the given power.

NUMERIC.ceil()

Calculates the smallest integer greater than or equal to a given number.

NUMERIC.floor()

Calculates the largest integer less than or equal to a given number.

NUMERIC.sin()

Calculates the sine of a given number.

NUMERIC.cos()

Calculates the cosine of a given number.

NUMERIC.tan()

Calculates the cotangent of a given number.

NUMERIC.cot()

Calculates the arc sine of a given number.

NUMERIC.asin()

Calculates the arc cosine of a given number.

NUMERIC.acos()

Calculates the arc tangent of a given number.

NUMERIC.atan()

Calculates the tangent of a given number.

NUMERIC.degrees()

Converts numeric from radians to degrees.

NUMERIC.radians()

Converts numeric from degrees to radians.

NUMERIC.sign()

Calculates the signum of a given number.

NUMERIC.round(INT)

Rounds the given number to integer places right to the decimal point.

pi()

Returns a value that is closer than any other value to pi.

e()

Returns a value that is closer than any other value to e.

rand()

Returns a pseudorandom double value between 0.0 (inclusive) and 1.0 (exclusive).

rand(seed integer)

Returns a pseudorandom double value between 0.0 (inclusive) and 1.0 (exclusive) with a initial seed. Two rand functions will return identical sequences of numbers if they have same initial seed.

randInteger(bound integer)

Returns a pseudorandom integer value between 0.0 (inclusive) and the specified value (exclusive).

randInteger(seed integer, bound integer)

Returns a pseudorandom integer value between 0.0 (inclusive) and the specified value (exclusive) with a initial seed. Two randInteger functions will return identical sequences of numbers if they have same initial seed and same bound.

NUMERIC.bin()

Returns a string representation of an integer numeric value in binary format. Returns null if numeric is null. E.g. "4" leads to "100", "12" leads to "1100".

Arithmetic functions Description
STRING + STRING

Concatenates two character strings.

STRING.charLength()

Returns the length of a String.

STRING.upperCase()

Returns all of the characters in a string in upper case using the rules of the default locale.

STRING.lowerCase()

Returns all of the characters in a string in lower case using the rules of the default locale.

STRING.position(STRING)

Returns the position of string in an other string starting at 1. Returns 0 if string could not be found. E.g. "a".position("bbbbba") leads to 6.

STRING.trim(
  leading = true,
  trailing = true,
  character = " ")

Removes leading and/or trailing characters from the given string.

STRING.overlay(STRING, INT)
STRING.overlay(STRING, INT, INT)

Replaces a substring of string with a string starting at a position (starting at 1). An optional length specifies how many characters should be removed. E.g. "xxxxxtest".overlay("xxxx", 6) leads to "xxxxxxxxx", "xxxxxtest".overlay('xxxx', 6, 2) leads to "xxxxxxxxxst".

STRING.substring(INT)

Creates a substring of the given string beginning at the given index to the end. The start index starts at 1 and is inclusive.

STRING.substring(INT, INT)

Creates a substring of the given string at the given index for the given length. The index starts at 1 and is inclusive, i.e., the character at the index is included in the substring. The substring has the specified length or less.

STRING.initCap()

Converts the initial letter of each word in a string to uppercase. Assumes a string containing only [A-Za-z0-9], everything else is treated as whitespace.

Conditional functions Description
BOOLEAN.?(value1, value2)

Ternary conditional operator that decides which of two other expressions should be evaluated based on a evaluated boolean condition. E.g. (42 > 5).?("A", "B") leads to "A".

Type conversion functions Description
ANY.cast(TYPE)

Converts a value to a given type. E.g. "42".cast(Types.INT) leads to 42.

Value constructor functions Description
NUMERIC.rows

Creates an interval of rows.

Temporal functions Description
STRING.toDate

Parses a date string in the form "yy-mm-dd" to a SQL date.

STRING.toTime

Parses a time string in the form "hh:mm:ss" to a SQL time.

STRING.toTimestamp

Parses a timestamp string in the form "yy-mm-dd hh:mm:ss.fff" to a SQL timestamp.

NUMERIC.year
NUMERIC.years

Creates an interval of months for a given number of years.

NUMERIC.month
NUMERIC.months

Creates an interval of months for a given number of months.

NUMERIC.day
NUMERIC.days

Creates an interval of milliseconds for a given number of days.

NUMERIC.hour
NUMERIC.hours

Creates an interval of milliseconds for a given number of hours.

NUMERIC.minute
NUMERIC.minutes

Creates an interval of milliseconds for a given number of minutes.

NUMERIC.second
NUMERIC.seconds

Creates an interval of milliseconds for a given number of seconds.

NUMERIC.milli
NUMERIC.millis

Creates an interval of milliseconds.

currentDate()

Returns the current SQL date in UTC time zone.

currentTime()

Returns the current SQL time in UTC time zone.

currentTimestamp()

Returns the current SQL timestamp in UTC time zone.

localTime()

Returns the current SQL time in local time zone.

localTimestamp()

Returns the current SQL timestamp in local time zone.

TEMPORAL.extract(TimeIntervalUnit)

Extracts parts of a time point or time interval. Returns the part as a long value. E.g. "2006-06-05".toDate.extract(TimeIntervalUnit.DAY) leads to 5.

TIMEPOINT.floor(TimeIntervalUnit)

Rounds a time point down to the given unit. E.g. "12:44:31".toTime.floor(TimeIntervalUnit.MINUTE) leads to 12:44:00.

TIMEPOINT.ceil(TimeIntervalUnit)

Rounds a time point up to the given unit. E.g. "12:44:31".toTime.floor(TimeIntervalUnit.MINUTE) leads to 12:45:00.

DATE.quarter()

Returns the quarter of a year from a SQL date. E.g. "1994-09-27".toDate.quarter() leads to 3.

temporalOverlaps(TIMEPOINT, TEMPORAL, TIMEPOINT, TEMPORAL)

Determines whether two anchored time intervals overlap. Time point and temporal are transformed into a range defined by two time points (start, end). The function evaluates leftEnd >= rightStart && rightEnd >= leftStart. E.g. temporalOverlaps('2:55:00'.toTime, 1.hour, '3:30:00'.toTime, 2.hours) leads to true.

Aggregate functions Description
FIELD.count

Returns the number of input rows for which the field is not null.

FIELD.avg

Returns the average (arithmetic mean) of the numeric field across all input values.

FIELD.sum

Returns the sum of the numeric field across all input values. If all values are null, null is returned.

FIELD.sum0

Returns the sum of the numeric field across all input values. If all values are null, 0 is returned.

FIELD.max

Returns the maximum value of field across all input values.

FIELD.min

Returns the minimum value of field across all input values.

FIELD.stddevPop

Returns the population standard deviation of the numeric field across all input values.

FIELD.stddevSamp

Returns the sample standard deviation of the numeric field across all input values.

FIELD.varPop

Returns the population variance (square of the population standard deviation) of the numeric field across all input values.

FIELD.varSamp

Returns the sample variance (square of the sample standard deviation) of the numeric field across all input values.

FIELD.collect
        

Returns the multiset aggregate of the input value.

Value access functions Description
COMPOSITE.get(STRING)
COMPOSITE.get(INT)

Accesses the field of a Flink composite type (such as Tuple, POJO, etc.) by index or name and returns it's value. E.g. 'pojo.get("myField") or 'tuple.get(0).

ANY.flatten()

Converts a Flink composite type (such as Tuple, POJO, etc.) and all of its direct subtypes into a flat representation where every subtype is a separate field. In most cases the fields of the flat representation are named similarly to the original fields but with a dollar separator (e.g. mypojo$mytuple$f0).

dateFormat(TIMESTAMP, STRING)

Formats timestamp as a string using a specified format. The format must be compatible with MySQL's date formatting syntax as used by the date_parse function. The format specification is given in the Date Format Specifier table below.

For example dateFormat('ts, "%Y, %d %M") results in strings formatted as "2017, 05 May".

Array functions Description
array(ANY [, ANY ]*)

Creates an array from a list of values. The array will be an array of objects (not primitives).

ARRAY.cardinality()

Returns the number of elements of an array.

ARRAY.at(INT)

Returns the element at a particular position in an array. The index starts at 1.

ARRAY.element()

Returns the sole element of an array with a single element. Returns null if the array is empty. Throws an exception if the array has more than one element.

Map functions Description
map(ANY, ANY [, ANY, ANY ]*)

Creates a map from a list of key-value pairs.

MAP.cardinality()

Returns the number of entries of a map.

MAP.at(ANY)

Returns the value specified by a particular key in a map.

Hash functions Description
STRING.md5()

Returns the MD5 hash of the string argument as a string of 32 hexadecimal digits; null if string is null.

STRING.sha1()

Returns the SHA-1 hash of the string argument as a string of 40 hexadecimal digits; null if string is null.

STRING.sha256()

Returns the SHA-256 hash of the string argument as a string of 64 hexadecimal digits; null if string is null.

Row functions Description
row(ANY, [, ANY]*)

Creates a row from a list of values. Row is composite type and can be access via value access functions.

Auxiliary functions Description
ANY.as(name [, name ]* )

Specifies a name for an expression i.e. a field. Additional names can be specified if the expression expands to multiple fields.

Unsupported Functions

The following operations are not supported yet:

  • Binary string operators and functions
  • System functions
  • Aggregate functions like REGR_xxx
  • Distinct aggregate functions like COUNT DISTINCT

Back to top