pyflink.ml package¶

Module contents¶

pyflink.ml.api module¶

class pyflink.ml.api.MLEnvironment(exe_env=None, stream_exe_env=None, batch_tab_env=None, stream_tab_env=None)[source]¶

The MLEnvironment stores the necessary context in Flink. Each MLEnvironment will be associated with a unique ID. The operations associated with the same MLEnvironment ID will share the same Flink job context. Both MLEnvironment ID and MLEnvironment can only be retrieved from MLEnvironmentFactory.

New in version 1.11.0.

get_batch_table_environment() → pyflink.table.table_environment.BatchTableEnvironment[source]¶

Get the BatchTableEnvironment. If the BatchTableEnvironment has not been set, it initial the BatchTableEnvironment with default Configuration.

Returns: the BatchTableEnvironment.

New in version 1.11.0.

get_execution_environment() → pyflink.dataset.execution_environment.ExecutionEnvironment[source]¶

Get the ExecutionEnvironment. If the ExecutionEnvironment has not been set, it initial the ExecutionEnvironment with default Configuration.

Returns: the batch ExecutionEnvironment.

New in version 1.11.0.

get_stream_execution_environment() → pyflink.datastream.stream_execution_environment.StreamExecutionEnvironment[source]¶

Get the StreamExecutionEnvironment. If the StreamExecutionEnvironment has not been set, it initial the StreamExecutionEnvironment with default Configuration.

Returns: the StreamExecutionEnvironment.

New in version 1.11.0.

get_stream_table_environment() → pyflink.table.table_environment.StreamTableEnvironment[source]¶

Get the StreamTableEnvironment. If the StreamTableEnvironment has not been set, it initial the StreamTableEnvironment with default Configuration.

Returns: the StreamTableEnvironment.

New in version 1.11.0.

class pyflink.ml.api.MLEnvironmentFactory[source]¶

Factory to get the MLEnvironment using a MLEnvironmentId.

New in version 1.11.0.

static get(ml_env_id: int) → Optional[pyflink.ml.api.ml_environment.MLEnvironment][source]¶

Get the MLEnvironment using a MLEnvironmentId.

Parameters: ml_env_id – the MLEnvironmentId
Returns: the MLEnvironment

New in version 1.11.0.

static get_default() → Optional[pyflink.ml.api.ml_environment.MLEnvironment][source]¶

Get the MLEnvironment use the default MLEnvironmentId.

Returns: the default MLEnvironment.

New in version 1.11.0.

static get_new_ml_environment_id() → int[source]¶

Create a unique MLEnvironment id and register a new MLEnvironment in the factory.

Returns: the MLEnvironment id.

New in version 1.11.0.

static register_ml_environment(ml_environment: pyflink.ml.api.ml_environment.MLEnvironment) → int[source]¶

Register a new MLEnvironment to the factory and return a new MLEnvironment id.

Parameters: ml_environment – the MLEnvironment that will be stored in the factory.
Returns: the MLEnvironment id.

New in version 1.11.0.

static remove(ml_env_id: int) → pyflink.ml.api.ml_environment.MLEnvironment[source]¶

Remove the MLEnvironment using the MLEnvironmentId.

Parameters: ml_env_id – the id.
Returns: the removed MLEnvironment

New in version 1.11.0.

class pyflink.ml.api.Transformer(params=None)[source]¶

A transformer is a PipelineStage that transforms an input Table to a result Table.

New in version 1.11.0.

abstract transform(table_env: pyflink.table.table_environment.TableEnvironment, table: pyflink.table.table.Table) → pyflink.table.table.Table[source]¶

Applies the transformer on the input table, and returns the result table.

Parameters

table_env – the table environment to which the input table is bound.
table – the table to be transformed

Returns

the transformed table

New in version 1.11.0.

class pyflink.ml.api.Estimator(params=None)[source]¶

Estimators are PipelineStages responsible for training and generating machine learning models.

The implementations are expected to take an input table as training samples and generate a Model which fits these samples.

New in version 1.11.0.

fit(table_env: pyflink.table.table_environment.TableEnvironment, table: pyflink.table.table.Table) → pyflink.ml.api.base.Model[source]¶

Train and produce a Model which fits the records in the given Table.

Parameters

table_env – the table environment to which the input table is bound.
table – the table with records to train the Model.

Returns

a model trained to fit on the given Table.

New in version 1.11.0.

class pyflink.ml.api.Model(params=None)[source]¶

Abstract class for models that are fitted by estimators.

A model is an ordinary Transformer except how it is created. While ordinary transformers are defined by specifying the parameters directly, a model is usually generated by an Estimator when Estimator.fit(table_env, table) is invoked.

New in version 1.11.0.

class pyflink.ml.api.Pipeline(stages=None, pipeline_json=None)[source]¶

A pipeline is a linear workflow which chains Estimators and Transformers to execute an algorithm.

A pipeline itself can either act as an Estimator or a Transformer, depending on the stages it includes. More specifically:

If a Pipeline has an Estimator, one needs to call Pipeline.fit(TableEnvironment, Table) before use the pipeline as a Transformer. In this case the Pipeline is an Estimator and can produce a Pipeline as a Model.

If a Pipeline has noEstimator, it is a Transformer and can be applied to a Table directly. In this case, Pipeline#fit(TableEnvironment, Table) will simply return the pipeline itself.

In addition, a pipeline can also be used as a PipelineStage in another pipeline, just like an ordinaryEstimator or Transformer as describe above.

New in version 1.11.0.

append_stage(stage: pyflink.ml.api.base.PipelineStage) → pyflink.ml.api.base.Pipeline[source]¶

fit(t_env: pyflink.table.table_environment.TableEnvironment, input: pyflink.table.table.Table) → pyflink.ml.api.base.Pipeline[source]¶

Train the pipeline to fit on the records in the given Table.

Parameters

t_env – the table environment to which the input table is bound.
input – the table with records to train the Pipeline.

Returns

a pipeline with same stages as this Pipeline except all Estimators replaced with their corresponding Models.

get_stages() → tuple[source]¶

load_json(json: str) → None[source]¶: This method can either load from a Java Pipeline json or a Python Pipeline json.

need_fit()[source]¶

to_json() → str[source]¶: If all PipelineStages in this Pipeline are Java ones, this method will return a Java json string, which can be loaded either from a Python Pipeline or a Java Pipeline, otherwise, it returns a Python json string which can only be loaded from a Python Pipeline.

transform(t_env: pyflink.table.table_environment.TableEnvironment, input: pyflink.table.table.Table) → pyflink.table.table.Table[source]¶

Generate a result table by applying all the stages in this pipeline to the input table in order.

Parameters

t_env – the table environment to which the input table is bound.
input – the table to be transformed.

Returns

a result table with all the stages applied to the input tables in order.

class pyflink.ml.api.PipelineStage(params=None)[source]¶

Base class for a stage in a pipeline. The interface is only a concept, and does not have any actual functionality. Its subclasses must be either Estimator or Transformer. No other classes should inherit this interface directly.

Each pipeline stage is with parameters, and requires a public empty constructor for restoration in Pipeline.

New in version 1.11.0.

get_params() → pyflink.ml.api.param.base.Params[source]¶

Returns all the parameters.

Returns: all the parameters.

load_json(json: str) → None[source]¶

to_json() → str[source]¶

class pyflink.ml.api.JavaTransformer(j_obj)[source]¶

Base class for Transformer that wrap Java implementations. Subclasses should ensure they have the transformer Java object available as j_obj.

New in version 1.11.0.

transform(table_env: pyflink.table.table_environment.TableEnvironment, table: pyflink.table.table.Table) → pyflink.table.table.Table[source]¶

Applies the transformer on the input table, and returns the result table.

Parameters

table_env – the table environment to which the input table is bound.
table – the table to be transformed

Returns

the transformed table

New in version 1.11.0.

class pyflink.ml.api.JavaEstimator(j_obj)[source]¶

Base class for Estimator that wrap Java implementations. Subclasses should ensure they have the estimator Java object available as j_obj.

New in version 1.11.0.

fit(table_env: pyflink.table.table_environment.TableEnvironment, table: pyflink.table.table.Table) → pyflink.ml.api.base.JavaModel[source]¶

Train and produce a Model which fits the records in the given Table.

Parameters

table_env – the table environment to which the input table is bound.
table – the table with records to train the Model.

Returns

a model trained to fit on the given Table.

New in version 1.11.0.

class pyflink.ml.api.JavaModel(j_obj)[source]¶: Base class for JavaTransformer that wrap Java implementations. Subclasses should ensure they have the model Java object available as j_obj.

New in version 1.11.0.

pyflink.ml.api.param module¶

class pyflink.ml.api.param.WithParams[source]¶

Parameters are widely used in machine learning realm. This class defines a common interface to interact with classes with parameters.

get(info: pyflink.ml.api.param.base.ParamInfo) → V[source]¶

Returns the value of the specific param.

Parameters: info – the info of the specific param, usually with default value.
Returns: the value of the specific param, or default value defined in the ParamInfo if the inner Params doesn’t contains this param.

get_params() → pyflink.ml.api.param.base.Params[source]¶

Returns all the parameters.

Returns: all the parameters.

set(info: pyflink.ml.api.param.base.ParamInfo, value: V) → pyflink.ml.api.param.base.WithParams[source]¶

Set the value of a specific parameter.

Parameters

info – the info of the specific param to set.
value – the value to be set to the specific param.

Returns

the WithParams itself.

class pyflink.ml.api.param.Params[source]¶

The map-like container class for parameter. This class is provided to unify the interaction with parameters.

clear() → None[source]¶

Removes all of the params. The params will be empty after this call returns.

Returns: None.

clone() → pyflink.ml.api.param.base.Params[source]¶

Creates and returns a deep clone of this Params.

Returns: a clone of this Params.

contains(info: pyflink.ml.api.param.base.ParamInfo) → bool[source]¶

Check whether this params has a value set for the given info.

Parameters: info – the info of the specific parameter to check.
Returns: True if this params has a value set for the specified info, false otherwise.

static from_json(json) → pyflink.ml.api.param.base.Params[source]¶

Factory method for constructing params.

Parameters: json – the json string to load.
Returns: the Params loaded from the json string.

get(info: pyflink.ml.api.param.base.ParamInfo) → V[source]¶

Returns the value of the specific parameter, or default value defined in the info if this Params doesn’t have a value set for the parameter. An exception will be thrown in the following cases because no value could be found for the specified parameter.

Parameters: info – the info of the specific parameter to set.
Returns: the value of the specific param, or default value defined in the info if this Params doesn’t contain the parameter.

is_empty() → bool[source]¶

Returns true if this params contains no mappings.

Returns: true if this params contains no mappings.

load_json(json: str) → None[source]¶

Restores the parameters from the given json. The parameters should be exactly the same with the one who was serialized to the input json after the restoration.

Parameters: json – the json String to restore from.
Returns: None.

merge(other_params: pyflink.ml.api.param.base.Params) → pyflink.ml.api.param.base.Params[source]¶

Merge other params into this.

Parameters: other_params – other params.
Returns: return this Params.

remove(info: pyflink.ml.api.param.base.ParamInfo) → V[source]¶

Removes the specific parameter from this Params.

Parameters: info – the info of the specific parameter to remove.
Returns: the type of the specific parameter.

set(info: pyflink.ml.api.param.base.ParamInfo, value: V) → pyflink.ml.api.param.base.Params[source]¶

Return the number of params.

Parameters

info – the info of the specific parameter to set.
value – the value to be set to the specific parameter.

Returns

return the current Params.

size() → int[source]¶

Return the number of params.

Returns: Return the number of params.

to_json() → str[source]¶

Returns a json containing all parameters in this Params. The json should be human-readable if possible.

Returns: a json containing all parameters in this Params.

class pyflink.ml.api.param.ParamInfo(name, description, is_optional=True, has_default_value=False, default_value=None, type_converter=None)[source]¶: Definition of a parameter, including name, description, type_converter and so on.

class pyflink.ml.api.param.TypeConverters[source]¶

Factory methods for common type conversion functions for Param.typeConverter. The TypeConverter makes PyFlink ML pipeline support more types of parameters. For example, a list could be a list, a range or an array. Validation can also be done in the converters.

static identity(value)[source]¶: Dummy converter that just returns value.

static to_boolean(value: bool) → bool[source]¶: Convert a value to a boolean, if possible.

static to_float(value: float) → float[source]¶: Convert a value to a float, if possible.

static to_int(value: int) → int[source]¶: Convert a value to an int, if possible.

static to_list(value) → List[source]¶: Convert a value to a list, if possible.

static to_list_float(value) → List[float][source]¶: Convert a value to list of floats, if possible.

static to_list_int(value) → List[int][source]¶: Convert a value to list of ints, if possible.

static to_list_string(value) → List[str][source]¶: Convert a value to list of strings, if possible.

static to_string(value: str) → str[source]¶: Convert a value to a string, if possible.

pyflink.ml.lib module¶

pyflink.ml.lib.param module¶

class pyflink.ml.lib.param.HasSelectedCols[source]¶

An interface for classes with a parameter specifying the name of multiple table columns.

New in version 1.11.0.

get_selected_cols() → list[source]¶

selected_cols = Param(name='selectedCols', description='Names of the columns used for processing')¶

set_selected_cols(v: list) → pyflink.ml.lib.param.colname.HasSelectedCols[source]¶

class pyflink.ml.lib.param.HasOutputCol[source]¶

An interface for classes with a parameter specifying the name of the output column.

New in version 1.11.0.

get_output_col() → str[source]¶

output_col = Param(name='outputCol', description='Name of the output column')¶

set_output_col(v: str) → pyflink.ml.lib.param.colname.HasOutputCol[source]¶

pyflink.ml package¶

Module contents¶

pyflink.ml.api module¶

pyflink.ml.api.param module¶

pyflink.ml.lib module¶

pyflink.ml.lib.param module¶

Table of Contents

Previous topic

Next topic

This Page