This documentation is for an out-of-date version of Apache Flink. We recommend you use the latest stable version.
$$ \newcommand{\R}{\mathbb{R}} \newcommand{\E}{\mathbb{E}} \newcommand{\x}{\mathbf{x}} \newcommand{\y}{\mathbf{y}} \newcommand{\wv}{\mathbf{w}} \newcommand{\av}{\mathbf{\alpha}} \newcommand{\bv}{\mathbf{b}} \newcommand{\N}{\mathbb{N}} \newcommand{\id}{\mathbf{I}} \newcommand{\ind}{\mathbf{1}} \newcommand{\0}{\mathbf{0}} \newcommand{\unit}{\mathbf{e}} \newcommand{\one}{\mathbf{1}} \newcommand{\zero}{\mathbf{0}} \newcommand\rfrac[2]{^{#1}\!/_{#2}} \newcommand{\norm}[1]{\left\lVert#1\right\rVert} $$

MinMax Scaler

Description

The MinMax scaler scales the given data set, so that all values will lie between a user specified range [min,max]. In case the user does not provide a specific minimum and maximum value for the scaling range, the MinMax scaler transforms the features of the input data set to lie in the [0,1] interval. Given a set of input data $x_1, x_2,… x_n$, with minimum value:

and maximum value:

The scaled data set $z_1, z_2,…,z_n$ will be:

where $\textit{min}$ and $\textit{max}$ are the user specified minimum and maximum values of the range to scale.

Operations

MinMaxScaler is a Transformer. As such, it supports the fit and transform operation.

Fit

MinMaxScaler is trained on all subtypes of Vector or LabeledVector:

  • fit[T <: Vector]: DataSet[T] => Unit
  • fit: DataSet[LabeledVector] => Unit

Transform

MinMaxScaler transforms all subtypes of Vector or LabeledVector into the respective type:

  • transform[T <: Vector]: DataSet[T] => DataSet[T]
  • transform: DataSet[LabeledVector] => DataSet[LabeledVector]

Parameters

The MinMax scaler implementation can be controlled by the following two parameters:

Parameters Description
Min

The minimum value of the range for the scaled data set. (Default value: 0.0)

Max

The maximum value of the range for the scaled data set. (Default value: 1.0)

Examples

// Create MinMax scaler transformer
val minMaxscaler = MinMaxScaler()
  .setMin(-1.0)

// Obtain data set to be scaled
val dataSet: DataSet[Vector] = ...

// Learn the minimum and maximum values of the training data
minMaxscaler.fit(dataSet)

// Scale the provided data set to have min=-1.0 and max=1.0
val scaledDS = minMaxscaler.transform(dataSet)