Table#

Table#

A Table object is the core abstraction of the Table API. Similar to how the DataStream API has DataStream, the Table API is built around Table.

A Table object describes a pipeline of data transformations. It does not contain the data itself in any way. Instead, it describes how to read data from a table source, and how to eventually write data to a table sink. The declared pipeline can be printed, optimized, and eventually executed in a cluster. The pipeline can work with bounded or unbounded streams which enables both streaming and batch scenarios.

By the definition above, a Table object can actually be considered as a view in SQL terms.

The initial Table object is constructed by a TableEnvironment. For example, from_path() obtains a table from a catalog. Every Table object has a schema that is available through get_schema(). A Table object is always associated with its original table environment during programming.

Every transformation (i.e. select()} or filter() on a Table object leads to a new Table object.

Use execute() to execute the pipeline and retrieve the transformed data locally during development. Otherwise, use execute_insert() to write the data into a table sink.

Many methods of this class take one or more Expression as parameters. For fluent definition of expressions and easier readability, we recommend to add a star import:

Example:

>>> from pyflink.table.expressions import *

Check the documentation for more programming language specific APIs.

The following example shows how to work with a Table object.

Example:

>>> from pyflink.table import EnvironmentSettings, TableEnvironment
>>> from pyflink.table.expressions import *
>>> env_settings = EnvironmentSettings.in_streaming_mode()
>>> t_env = TableEnvironment.create(env_settings)
>>> table = t_env.from_path("my_table").select(col("colA").trim(), col("colB") + 12)
>>> table.execute().print()

Table.add_columns(*fields)

Adds additional columns.

Table.add_or_replace_columns(*fields)

Adds additional columns.

Table.aggregate(func)

Performs a global aggregate operation with an aggregate function.

Table.alias(field, *fields)

Renames the fields of the expression result.

Table.distinct()

Removes duplicate values and returns only distinct (different) values.

Table.drop_columns(*fields)

Drops existing columns.

Table.drop_columns(*fields)

Drops existing columns.

Table.execute()

Collects the contents of the current table local client.

Table.execute_insert(table_path_or_descriptor)

  1. When target_path_or_descriptor is a tale path:

Table.explain(*extra_details)

Returns the AST of this table and the execution plan.

Table.fetch(fetch)

Limits a (possibly sorted) result to the first n rows.

Table.filter(predicate)

Filters out elements that don't pass the filter predicate.

Table.flat_aggregate(func)

Perform a global flat_aggregate without group_by.

Table.flat_map(func)

Performs a flatMap operation with a user-defined table function.

Table.full_outer_join(right, join_predicate)

Joins two Table.

Table.get_schema()

Returns the TableSchema of this table.

Table.group_by(*fields)

Groups the elements on some grouping keys.

Table.intersect(right)

Intersects two Table with duplicate records removed.

Table.intersect_all(right)

Intersects two Table.

Table.join(right[, join_predicate])

Joins two Table.

Table.join_lateral(table_function_call[, ...])

Joins this Table with an user-defined TableFunction.

Table.left_outer_join(right[, join_predicate])

Joins two Table.

Table.left_outer_join_lateral(...[, ...])

Joins this Table with an user-defined TableFunction.

Table.limit(fetch[, offset])

Limits a (possibly sorted) result to the first n rows.

Table.map(func)

Performs a map operation with a user-defined scalar function.

Table.minus(right)

Minus of two Table with duplicate records removed.

Table.minus_all(right)

Minus of two Table.

Table.offset(offset)

Limits a (possibly sorted) result from an offset position.

Table.order_by(*fields)

Sorts the given Table.

Table.over_window(*over_windows)

Defines over-windows on the records of a table.

Table.print_schema()

Prints the schema of this table to the console in a tree format.

Table.rename_columns(*fields)

Renames existing columns.

Table.right_outer_join(right, join_predicate)

Joins two Table.

Table.select(*fields)

Performs a selection operation.

Table.to_pandas()

Converts the table to a pandas DataFrame.

Table.union(right)

Unions two Table with duplicate records removed.

Table.union_all(right)

Unions two Table.

Table.where(predicate)

Filters out elements that don't pass the filter predicate.

Table.window(window)

Defines group window on the records of a table.

GroupedTable#

A table that has been grouped on a set of grouping keys.

GroupedTable.select(*fields)

Performs a selection operation on a grouped table.

GroupedTable.aggregate(func)

Performs a aggregate operation with an aggregate function.

GroupedTable.flat_aggregate(func)

Performs a flat_aggregate operation on a grouped table.

GroupWindowedTable#

A table that has been windowed for GroupWindow.

GroupWindowedTable.group_by(*fields)

Groups the elements by a mandatory window and one or more optional grouping attributes.

WindowGroupedTable#

A table that has been windowed and grouped for GroupWindow.

WindowGroupedTable.select(*fields)

Performs a selection operation on a window grouped table.

WindowGroupedTable.aggregate(func)

Performs an aggregate operation on a window grouped table.

OverWindowedTable#

A table that has been windowed for OverWindow.

Unlike group windows, which are specified in the GROUP BY clause, over windows do not collapse rows. Instead over window aggregates compute an aggregate for each input row over a range of its neighboring rows.

OverWindowedTable.select(*fields)

Performs a selection operation on a over windowed table.

AggregatedTable#

A table that has been performed on the aggregate function.

AggregatedTable.select(*fields)

Performs a selection operation after an aggregate operation.

FlatAggregateTable#

A table that performs flatAggregate on a Table, a GroupedTable or a WindowGroupedTable

FlatAggregateTable.select(*fields)

Performs a selection operation on a FlatAggregateTable.