This page describes the solutions to some common questions for PyFlink users.
You can download a convenience script to prepare a Python virtual env zip which can be used on Mac OS and most Linux distributions. You can specify the version parameter to generate a Python virtual environment required for the corresponding PyFlink version, otherwise the most recent version will be installed.
After setting up a python virtual environment, as described in the previous section, you should activate the environment before executing the PyFlink job.
For details on the usage of add_python_archive
and set_python_executable
, you can refer to the relevant documentation.
A PyFlink job may depend on jar files, i.e. connectors, Java UDFs, etc. The way to add the jar files is different according to the deployment mode.
You need to copy the jar files to the path site-packages/pyflink/lib
of the used Python interpreter.
You can execute the following command to find the path:
You can use the command-line argument -j <jarFile>
to specify the used jar file. For more details about the command-line arguments of -j <jarFile>
, You can refer to the relevant documentation.
Note Currently, Flink CLI only allows to specify one jar file. You can package them into one zip file as following:
You can use the command-line arguments pyfs
or the API add_python_file
of TableEnvironment
to add python file dependencies which could be python files, python packages or local directories.
For example, if you have a directory named myDir
which has the following hierarchy:
myDir
├──utils
├──__init__.py
├──my_util.py
You can add the Python files of directory myDir
as following: