It supports creating a PyFlink Table from a Pandas DataFrame. Internally, it will serialize the Pandas DataFrame
using Arrow columnar format at client side and the serialized data will be processed and deserialized in Arrow source
during execution. The Arrow source could also be used in streaming jobs and it will properly handle the checkpoint
and provides the exactly once guarantees.
The following example shows how to create a PyFlink Table from a Pandas DataFrame:
Convert PyFlink Table to Pandas DataFrame
It also supports converting a PyFlink Table to a Pandas DataFrame. Internally, it will materialize the results of the
table and serialize them into multiple Arrow batches of Arrow columnar format at client side. The maximum Arrow batch size
is determined by the config option python.fn-execution.arrow.batch.size.
The serialized data will then be converted to Pandas DataFrame. It will collect the content of the table to
the client side and so please make sure that the content of the table could fit in memory before calling this method.
The following example shows how to convert a PyFlink Table to a Pandas DataFrame: