pyflink.datastream.stream_execution_environment.StreamExecutionEnvironment.add_python_archive#

StreamExecutionEnvironment.add_python_archive(archive_path: str, target_dir: Optional[str] = None)[source]#

Adds a python archive file. The file will be extracted to the working directory of python UDF worker.

If the parameter “target_dir” is specified, the archive file will be extracted to a directory named ${target_dir}. Otherwise, the archive file will be extracted to a directory with the same name of the archive file.

If python UDF depends on a specific python version which does not exist in the cluster, this method can be used to upload the virtual environment. Note that the path of the python interpreter contained in the uploaded environment should be specified via the method pyflink.table.TableConfig.set_python_executable().

The files uploaded via this method are also accessible in UDFs via relative path.

Example:

# command executed in shell
# assert the relative path of python interpreter is py_env/bin/python
$ zip -r py_env.zip py_env

# python code
>>> stream_env.add_python_archive("py_env.zip")
>>> stream_env.set_python_executable("py_env.zip/py_env/bin/python")

# or
>>> stream_env.add_python_archive("py_env.zip", "myenv")
>>> stream_env.set_python_executable("myenv/py_env/bin/python")

# the files contained in the archive file can be accessed in UDF
>>> def my_udf():
...     with open("myenv/py_env/data/data.txt") as f:
...         ...

Note

Please make sure the uploaded python environment matches the platform that the cluster is running on and that the python version must be 3.6 or higher.

Note

Currently only zip-format is supported. i.e. zip, jar, whl, egg, etc. The other archive formats such as tar, tar.gz, 7z, rar, etc are not supported.

Parameters

archive_path – The archive file path.
target_dir – Optional, the target dir name that the archive file extracted to.