pyflink.ml package¶
Module contents¶
pyflink.ml.api module¶
-
class
pyflink.ml.api.
MLEnvironment
(exe_env=None, stream_exe_env=None, batch_tab_env=None, stream_tab_env=None)[source]¶ The MLEnvironment stores the necessary context in Flink. Each MLEnvironment will be associated with a unique ID. The operations associated with the same MLEnvironment ID will share the same Flink job context. Both MLEnvironment ID and MLEnvironment can only be retrieved from MLEnvironmentFactory.
New in version 1.11.0.
-
get_batch_table_environment
() → pyflink.table.table_environment.BatchTableEnvironment[source]¶ Get the BatchTableEnvironment. If the BatchTableEnvironment has not been set, it initial the BatchTableEnvironment with default Configuration.
- Returns
the BatchTableEnvironment.
New in version 1.11.0.
-
get_execution_environment
() → pyflink.dataset.execution_environment.ExecutionEnvironment[source]¶ Get the ExecutionEnvironment. If the ExecutionEnvironment has not been set, it initial the ExecutionEnvironment with default Configuration.
- Returns
the batch ExecutionEnvironment.
New in version 1.11.0.
-
get_stream_execution_environment
() → pyflink.datastream.stream_execution_environment.StreamExecutionEnvironment[source]¶ Get the StreamExecutionEnvironment. If the StreamExecutionEnvironment has not been set, it initial the StreamExecutionEnvironment with default Configuration.
- Returns
the StreamExecutionEnvironment.
New in version 1.11.0.
-
get_stream_table_environment
() → pyflink.table.table_environment.StreamTableEnvironment[source]¶ Get the StreamTableEnvironment. If the StreamTableEnvironment has not been set, it initial the StreamTableEnvironment with default Configuration.
- Returns
the StreamTableEnvironment.
New in version 1.11.0.
-
-
class
pyflink.ml.api.
MLEnvironmentFactory
[source]¶ Factory to get the MLEnvironment using a MLEnvironmentId.
New in version 1.11.0.
-
static
get
(ml_env_id: int) → Optional[pyflink.ml.api.ml_environment.MLEnvironment][source]¶ Get the MLEnvironment using a MLEnvironmentId.
- Parameters
ml_env_id – the MLEnvironmentId
- Returns
the MLEnvironment
New in version 1.11.0.
-
static
get_default
() → Optional[pyflink.ml.api.ml_environment.MLEnvironment][source]¶ Get the MLEnvironment use the default MLEnvironmentId.
- Returns
the default MLEnvironment.
New in version 1.11.0.
-
static
get_new_ml_environment_id
() → int[source]¶ Create a unique MLEnvironment id and register a new MLEnvironment in the factory.
- Returns
the MLEnvironment id.
New in version 1.11.0.
-
static
register_ml_environment
(ml_environment: pyflink.ml.api.ml_environment.MLEnvironment) → int[source]¶ Register a new MLEnvironment to the factory and return a new MLEnvironment id.
- Parameters
ml_environment – the MLEnvironment that will be stored in the factory.
- Returns
the MLEnvironment id.
New in version 1.11.0.
-
static
-
class
pyflink.ml.api.
Transformer
(params=None)[source]¶ A transformer is a PipelineStage that transforms an input Table to a result Table.
New in version 1.11.0.
-
abstract
transform
(table_env: pyflink.table.table_environment.TableEnvironment, table: pyflink.table.table.Table) → pyflink.table.table.Table[source]¶ Applies the transformer on the input table, and returns the result table.
- Parameters
table_env – the table environment to which the input table is bound.
table – the table to be transformed
- Returns
the transformed table
New in version 1.11.0.
-
abstract
-
class
pyflink.ml.api.
Estimator
(params=None)[source]¶ Estimators are PipelineStages responsible for training and generating machine learning models.
The implementations are expected to take an input table as training samples and generate a Model which fits these samples.
New in version 1.11.0.
-
fit
(table_env: pyflink.table.table_environment.TableEnvironment, table: pyflink.table.table.Table) → pyflink.ml.api.base.Model[source]¶ Train and produce a Model which fits the records in the given Table.
- Parameters
table_env – the table environment to which the input table is bound.
table – the table with records to train the Model.
- Returns
a model trained to fit on the given Table.
New in version 1.11.0.
-
-
class
pyflink.ml.api.
Model
(params=None)[source]¶ Abstract class for models that are fitted by estimators.
A model is an ordinary Transformer except how it is created. While ordinary transformers are defined by specifying the parameters directly, a model is usually generated by an Estimator when Estimator.fit(table_env, table) is invoked.
New in version 1.11.0.
-
class
pyflink.ml.api.
Pipeline
(stages=None, pipeline_json=None)[source]¶ A pipeline is a linear workflow which chains Estimators and Transformers to execute an algorithm.
A pipeline itself can either act as an Estimator or a Transformer, depending on the stages it includes. More specifically:
If a Pipeline has an Estimator, one needs to call Pipeline.fit(TableEnvironment, Table) before use the pipeline as a Transformer. In this case the Pipeline is an Estimator and can produce a Pipeline as a Model.
If a Pipeline has noEstimator, it is a Transformer and can be applied to a Table directly. In this case, Pipeline#fit(TableEnvironment, Table) will simply return the pipeline itself.
In addition, a pipeline can also be used as a PipelineStage in another pipeline, just like an ordinaryEstimator or Transformer as describe above.
New in version 1.11.0.
-
fit
(t_env: pyflink.table.table_environment.TableEnvironment, input: pyflink.table.table.Table) → pyflink.ml.api.base.Pipeline[source]¶ Train the pipeline to fit on the records in the given Table.
- Parameters
t_env – the table environment to which the input table is bound.
input – the table with records to train the Pipeline.
- Returns
a pipeline with same stages as this Pipeline except all Estimators replaced with their corresponding Models.
-
load_json
(json: str) → None[source]¶ This method can either load from a Java Pipeline json or a Python Pipeline json.
-
to_json
() → str[source]¶ If all PipelineStages in this Pipeline are Java ones, this method will return a Java json string, which can be loaded either from a Python Pipeline or a Java Pipeline, otherwise, it returns a Python json string which can only be loaded from a Python Pipeline.
-
transform
(t_env: pyflink.table.table_environment.TableEnvironment, input: pyflink.table.table.Table) → pyflink.table.table.Table[source]¶ Generate a result table by applying all the stages in this pipeline to the input table in order.
- Parameters
t_env – the table environment to which the input table is bound.
input – the table to be transformed.
- Returns
a result table with all the stages applied to the input tables in order.
-
-
class
pyflink.ml.api.
PipelineStage
(params=None)[source]¶ Base class for a stage in a pipeline. The interface is only a concept, and does not have any actual functionality. Its subclasses must be either Estimator or Transformer. No other classes should inherit this interface directly.
Each pipeline stage is with parameters, and requires a public empty constructor for restoration in Pipeline.
New in version 1.11.0.
-
class
pyflink.ml.api.
JavaTransformer
(j_obj)[source]¶ Base class for Transformer that wrap Java implementations. Subclasses should ensure they have the transformer Java object available as j_obj.
New in version 1.11.0.
-
transform
(table_env: pyflink.table.table_environment.TableEnvironment, table: pyflink.table.table.Table) → pyflink.table.table.Table[source]¶ Applies the transformer on the input table, and returns the result table.
- Parameters
table_env – the table environment to which the input table is bound.
table – the table to be transformed
- Returns
the transformed table
New in version 1.11.0.
-
-
class
pyflink.ml.api.
JavaEstimator
(j_obj)[source]¶ Base class for Estimator that wrap Java implementations. Subclasses should ensure they have the estimator Java object available as j_obj.
New in version 1.11.0.
-
fit
(table_env: pyflink.table.table_environment.TableEnvironment, table: pyflink.table.table.Table) → pyflink.ml.api.base.JavaModel[source]¶ Train and produce a Model which fits the records in the given Table.
- Parameters
table_env – the table environment to which the input table is bound.
table – the table with records to train the Model.
- Returns
a model trained to fit on the given Table.
New in version 1.11.0.
-
pyflink.ml.api.param module¶
-
class
pyflink.ml.api.param.
WithParams
[source]¶ Parameters are widely used in machine learning realm. This class defines a common interface to interact with classes with parameters.
-
get
(info: pyflink.ml.api.param.base.ParamInfo) → V[source]¶ Returns the value of the specific param.
- Parameters
info – the info of the specific param, usually with default value.
- Returns
the value of the specific param, or default value defined in the ParamInfo if the inner Params doesn’t contains this param.
-
-
class
pyflink.ml.api.param.
Params
[source]¶ The map-like container class for parameter. This class is provided to unify the interaction with parameters.
-
clear
() → None[source]¶ Removes all of the params. The params will be empty after this call returns.
- Returns
None.
-
clone
() → pyflink.ml.api.param.base.Params[source]¶ Creates and returns a deep clone of this Params.
- Returns
a clone of this Params.
-
contains
(info: pyflink.ml.api.param.base.ParamInfo) → bool[source]¶ Check whether this params has a value set for the given info.
- Parameters
info – the info of the specific parameter to check.
- Returns
True if this params has a value set for the specified info, false otherwise.
-
static
from_json
(json) → pyflink.ml.api.param.base.Params[source]¶ Factory method for constructing params.
- Parameters
json – the json string to load.
- Returns
the Params loaded from the json string.
-
get
(info: pyflink.ml.api.param.base.ParamInfo) → V[source]¶ Returns the value of the specific parameter, or default value defined in the info if this Params doesn’t have a value set for the parameter. An exception will be thrown in the following cases because no value could be found for the specified parameter.
- Parameters
info – the info of the specific parameter to set.
- Returns
the value of the specific param, or default value defined in the info if this Params doesn’t contain the parameter.
-
is_empty
() → bool[source]¶ Returns true if this params contains no mappings.
- Returns
true if this params contains no mappings.
-
load_json
(json: str) → None[source]¶ Restores the parameters from the given json. The parameters should be exactly the same with the one who was serialized to the input json after the restoration.
- Parameters
json – the json String to restore from.
- Returns
None.
-
merge
(other_params: pyflink.ml.api.param.base.Params) → pyflink.ml.api.param.base.Params[source]¶ Merge other params into this.
- Parameters
other_params – other params.
- Returns
return this Params.
-
remove
(info: pyflink.ml.api.param.base.ParamInfo) → V[source]¶ Removes the specific parameter from this Params.
- Parameters
info – the info of the specific parameter to remove.
- Returns
the type of the specific parameter.
-
-
class
pyflink.ml.api.param.
ParamInfo
(name, description, is_optional=True, has_default_value=False, default_value=None, type_converter=None)[source]¶ Definition of a parameter, including name, description, type_converter and so on.
-
class
pyflink.ml.api.param.
TypeConverters
[source]¶ Factory methods for common type conversion functions for Param.typeConverter. The TypeConverter makes PyFlink ML pipeline support more types of parameters. For example, a list could be a list, a range or an array. Validation can also be done in the converters.
pyflink.ml.lib module¶
pyflink.ml.lib.param module¶
-
class
pyflink.ml.lib.param.
HasSelectedCols
[source]¶ An interface for classes with a parameter specifying the name of multiple table columns.
New in version 1.11.0.
-
selected_cols
= Param(name='selectedCols', description='Names of the columns used for processing')¶
-