Release Notes - Flink 1.16

Release notes - Flink 1.16 #

These release notes discuss important aspects, such as configuration, behavior, or dependencies, that changed between Flink 1.15 and Flink 1.16. Please read these notes carefully if you are planning to upgrade your Flink version to 1.16.

Clusters & Deployment #

Deprecate host/web-ui-port parameter of jobmanager.sh #

The host/web-ui-port parameters of the jobmanager.sh script have been deprecated. These can (and should) be specified with the corresponding options as dynamic properties.

Table API & SQL #

Remove string expression DSL #

The deprecated String expression DSL has been removed from Java/Scala/Python Table API.

Support Retryable Lookup Join To Solve Delayed Updates Issue In External Systems #

Adds retryable lookup join to support both async and sync lookups in order to solve the delayed updates issue in external systems.

Make AsyncDataStream.OutputMode configurable for table module #

It is recommend to set the new option ’table.exec.async-lookup.output-mode’ to ‘ALLOW_UNORDERED’ when no stritctly output order is needed, this will yield significant performance gains on append-only streams

Harden correctness for non-deterministic updates present in the changelog pipeline #

For complex streaming jobs, now it’s possible to detect and resolve potential correctness issues before running.

Connectors #

Move Elasticsearch connector to external connector repository #

The Elasticsearch connector has been copied from the Flink repository to its own individual repository at https://github.com/apache/flink-connector-elasticsearch. For this release, identical Elasticsearch connector artifacts will be available from both repositories but with different versions. For example, the first releases will be 1.16.0 and the externally versioned and maintained artifact 3.0.0. Developers are encouraged to move to the latter during this release cycle.

Drop support for Hive versions 1.*, 2.1.* and 2.2.* #

Support for Hive 1.*, 2.1.* and 2.2.* has been dropped from Flink. These Hive versions are no longer supported by the Hive community and therefore are also no longer supported by Flink.

Hive sink report statistics to Hive metastore #

In batch mode, Hive sink now will report statistics to Hive metastore by default for written tables and partitions. This might be time-consuming when there are many written files. You can disable this feature by setting table.exec.hive.sink.statistic-auto-gather.enable to false.

Remove a number of Pulsar cursor APIs #

A number of breaking changes were made to the Pulsar Connector cursor APIs:

  • CursorPosition#seekPosition() has been removed.
  • StartCursor#seekPosition() has been removed.
  • StopCursor#shouldStop now returns a StopCondition instead of a boolean.

Mark StreamingFileSink as deprecated #

The StreamingFileSink has been deprecated in favor of the unified FileSink since Flink 1.12.

Avro schemas generated by Flink now use the “org.apache.flink.avro.generated” namespace for compatibility with the Avro Python SDK.

Introduce configurable RateLimitingStrategy for Async Sink #

Supports configurable RateLimitingStrategy for the AsyncSinkWriter. This change allows sink implementers to change the behaviour of an AsyncSink when requests fail, for a specific sink. If no RateLimitingStrategy is specified, it will use the current default of AIMDRateLimitingStrategy.

Runtime & Coordination #

Deprecate reflection-based reporter instantiation #

Configuring reporters by their class has been deprecated. Reporter implementations should provide a MetricReporterFactory, and all configurations should be migrated to such a factory.

If the reporter is loaded from the plugins directory, setting metrics.reporter.reporter_name.class no longer works.

Deprecate Datadog reporter ’tags’ option #

The ’tags’ option from the DatadogReporter has been deprecated in favor of the generic ‘scope.variables.additional’ option.

Return 503 Service Unavailable if endpoint is not ready yet #

The REST API now returns a 503 Service Unavailable error when a request is made but the backing component isn’t ready yet. Previously this returned a 500 Internal Server error.

Make the job id distinct in application mode when HA is enabled. #

The JobID in application mode is no longer 0000000000, but instead based on the cluster ID.

Checkpoints #

Add the overdraft buffer in BufferPool to reduce unaligned checkpoint being blocked #

New concept of overdraft network buffers was introduced to mitigate effects of uninterruptible blocking a subtask thread during back pressure. Starting from 1.16.0 Flink subtask can request by default up to 5 extra (overdraft) buffers over the regular configured amount(you can read more about this in the documentation: https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/memory/network_mem_tuning/#overdraft-buffers). This change can slightly increase memory consumption of the Flink Job. To restore the older behaviour you can set taskmanager.network.memory.max-overdraft-buffers-per-gate to zero.

Non-deterministic UID generation might cause issues during restore #

1.15.0 and 1.15.1 generated non-deterministic UIDs for operators, which make it difficult/impossible to restore state or upgrade to next patch version. A new table.exec.uid.generation config option (with correct default behavior) disables setting a UID for new pipelines from non-compiled plans. Existing pipelines can set table.exec.uid.generation=ALWAYS if the 1.15.0/1 behavior was acceptable.

Python #

Python 3.6 extended support ended on 23 December 2021. We plan that PyFlink 1.16 will be the last version support Python3.6.

Dependency upgrades #

Update the Hadoop implementation for filesystems to 3.3.2 #

The Hadoop implementation used for Flink’s filesystem implementation has been updated. This provides Flink users with the features that are listed under https://issues.apache.org/jira/browse/HADOOP-17566. One of these features is to enable client-side encryption of Flink state via https://issues.apache.org/jira/browse/HADOOP-13887.

Upgrade Kafka Client to 3.1.1 #

Kafka connector uses Kafka client 3.1.1 by default now.

Upgrade Hive 2.3 connector to version 2.3.9 #

Upgrade Hive 2.3 connector to version 2.3.9

For support of Python3.9 and M1, PyFlink updates a series dependencies version: apache-beam==2.38.0 arrow==5.0.0 pemja==0.2.6

Update dependency version for system resources metrics #

System resource metrics dependencies has been updated to the following:

com.github.oshi:oshi-core:6.1.5 (licensed under MIT license)
net.java.dev.jna:jna-platform:jar:5.10.0
net.java.dev.jna:jna:jar:5.10.0