pyflink.table.table_environment.StreamTableEnvironment.from_pandas#

StreamTableEnvironment.from_pandas(pdf, schema: Optional[Union[pyflink.table.types.RowType, List[str], Tuple[str], List[pyflink.table.types.DataType], Tuple[pyflink.table.types.DataType]]] = None, splits_num: int = 1) → pyflink.table.table.Table#

Creates a table from a pandas DataFrame.

Example:

>>> pdf = pd.DataFrame(np.random.rand(1000, 2))
# use the second parameter to specify custom field names
>>> table_env.from_pandas(pdf, ["a", "b"])
# use the second parameter to specify custom field types
>>> table_env.from_pandas(pdf, [DataTypes.DOUBLE(), DataTypes.DOUBLE()]))
# use the second parameter to specify custom table schema
>>> table_env.from_pandas(pdf,
...                       DataTypes.ROW([DataTypes.FIELD("a", DataTypes.DOUBLE()),
...                                      DataTypes.FIELD("b", DataTypes.DOUBLE())]))

Parameters

pdf – The pandas DataFrame.
schema – The schema of the converted table.
splits_num – The number of splits the given Pandas DataFrame will be split into. It determines the number of parallel source tasks. If not specified, the default parallelism will be used.

Returns

The result table.

New in version 1.11.0.