This documentation is for an unreleased version of Apache Flink. We recommend you use the latest stable version.
Execution Mode #
The Python API supports different runtime execution modes from which you can choose depending on the requirements of your use case and the characteristics of your job. The Python runtime execution mode defines how the Python user-defined functions will be executed.
Prior to release-1.15, there is the only execution mode called
PROCESS execution mode. The
mode means that the Python user-defined functions will be executed in separate Python processes.
In release-1.15, it has introduced a new execution mode called
THREAD execution mode. The
mode means that the Python user-defined functions will be executed in JVM.
NOTE: Multiple Python user-defined functions running in the same JVM are still affected by GIL.
When can/should I use THREAD execution mode? #
The purpose of the introduction of
THREAD mode is to overcome the overhead of serialization/deserialization
and network communication introduced of inter-process communication in the
So if performance is not your concern, or the computing logic of your Python user-defined functions is the performance bottleneck of the job,
PROCESS mode will be the best choice as
PROCESS mode provides the best isolation compared to
Configuring Python execution mode #
The execution mode can be configured via the
There are two possible values:
PROCESS: The Python user-defined functions will be executed in separate Python process. (default)
THREAD: The Python user-defined functions will be executed in JVM.
You could specify the execution mode in Python Table API or Python DataStream API jobs as following:
## Python Table API # Specify `PROCESS` mode table_env.get_config().set("python.execution-mode", "process") # Specify `THREAD` mode table_env.get_config().set("python.execution-mode", "thread") ## Python DataStream API config = Configuration() # Specify `PROCESS` mode config.set_string("python.execution-mode", "process") # Specify `THREAD` mode config.set_string("python.execution-mode", "thread") # Create the corresponding StreamExecutionEnvironment env = StreamExecutionEnvironment.get_execution_environment(config)
Supported Cases #
Python Table API #
The following table shows where the
THREAD execution mode is supported in Python Table API.
|Pandas UDF & Pandas UDAF||Yes||No|
Python DataStream API #
The following Table shows the supported cases in Python DataStream API.
Currently, it still doesn’t support to execute Python UDFs in
THREADexecution mode in all places. It will fall back to
PROCESSexecution mode in these cases. So it may happen that you configure a job to execute in
THREADexecution mode, however, it’s actually executed in
THREADexecution mode is only supported in Python 3.7+.
Execution Behavior #
This section provides an overview of the execution behavior of
THREAD execution mode and contrasts
PROCESS execution mode. For more details, please refer to the FLIP that introduced this feature:
PROCESS Execution Mode #
PROCESS execution mode, the Python user-defined functions will be executed in separate Python Worker process.
The Java operator process communicates with the Python worker process using various Grpc services.
THREAD Execution Mode #
THREAD execution mode, the Python user-defined functions will be executed in the same process
as Java operators. PyFlink takes use of third part library PEMJA
to embed Python in Java Application.