Flink Table API & SQL empowers users to do data transformations with functions.
There are two dimensions to classify functions in Flink.
One dimension is system (or built-in) functions v.s. catalog functions. System functions have no namespace and can be
referenced with just their names. Catalog functions belong to a catalog and database therefore they have catalog and database
namespaces, they can be referenced by either fully/partially qualified name (catalog.db.func
or db.func
) or just the
function name.
The other dimension is temporary functions v.s. persistent functions. Temporary functions are volatile and only live up to lifespan of a session, they are always created by users. Persistent functions live across lifespan of sessions, they are either provided by the system or persisted in catalogs.
The two dimensions give Flink users 4 categories of functions:
There are two ways users can reference a function in Flink - referencing function precisely or ambiguously.
Precise function reference empowers users to use catalog functions specifically, and across catalog and across database,
e.g. select mycatalog.mydb.myfunc(x) from mytable
and select mydb.myfunc(x) from mytable
.
This is only supported starting from Flink 1.10.
In ambiguous function reference, users just specify the function’s name in SQL query, e.g. select myfunc(x) from mytable
.
The resolution order only matters when there are functions of different types but the same name, e.g. when there’re three functions all named “myfunc” but are of temporary catalog, catalog, and system function respectively. If there’s no function name collision, functions will just be resolved to the sole one.
Because system functions don’t have namespaces, a precise function reference in Flink must be pointing to either a temporary catalog function or a catalog function.
The resolution order is:
The resolution order is: