$$ \newcommand{\R}{\mathbb{R}} \newcommand{\E}{\mathbb{E}} \newcommand{\x}{\mathbf{x}} \newcommand{\y}{\mathbf{y}} \newcommand{\wv}{\mathbf{w}} \newcommand{\av}{\mathbf{\alpha}} \newcommand{\bv}{\mathbf{b}} \newcommand{\N}{\mathbb{N}} \newcommand{\id}{\mathbf{I}} \newcommand{\ind}{\mathbf{1}} \newcommand{\0}{\mathbf{0}} \newcommand{\unit}{\mathbf{e}} \newcommand{\one}{\mathbf{1}} \newcommand{\zero}{\mathbf{0}} \newcommand\rfrac[2]{^{#1}\!/_{#2}} \newcommand{\norm}[1]{\left\lVert#1\right\rVert} $$

Multiple Linear Regression

This documentation is for an out-of-date version of Apache Flink. We recommend you use the latest stable version.

Description
Operations
- Fit
- Predict
Parameters
Examples

Description

Multiple linear regression tries to find a linear function which best fits the provided input data. Given a set of input data with its value $(\mathbf{x}, y)$, multiple linear regression finds a vector $\mathbf{w}$ such that the sum of the squared residuals is minimized:

$S(\mathbf{w}) = \sum_{i=1} \left(y - \mathbf{w}^T\mathbf{x_i} \right)^2$

Written in matrix notation, we obtain the following formulation:

$\mathbf{w}^* = \arg \min_{\mathbf{w}} (\mathbf{y} - X\mathbf{w})^2$

This problem has a closed form solution which is given by:

$\mathbf{w}^* = \left(X^TX\right)^{-1}X^T\mathbf{y}$

However, in cases where the input data set is so huge that a complete parse over the whole data set is prohibitive, one can apply stochastic gradient descent (SGD) to approximate the solution. SGD first calculates for a random subset of the input data set the gradients. The gradient for a given point $\mathbf{x}_i$ is given by:

$\nabla_{\mathbf{w}} S(\mathbf{w}, \mathbf{x_i}) = 2\left(\mathbf{w}^T\mathbf{x_i} - y\right)\mathbf{x_i}$

The gradients are averaged and scaled. The scaling is defined by $\gamma = \frac{s}{\sqrt{j}}$ with $s$ being the initial step size and $j$ being the current iteration number. The resulting gradient is subtracted from the current weight vector giving the new weight vector for the next iteration:

$\mathbf{w}_{t+1} = \mathbf{w}_t - \gamma \frac{1}{n}\sum_{i=1}^n \nabla_{\mathbf{w}} S(\mathbf{w}, \mathbf{x_i})$

The multiple linear regression algorithm computes either a fixed number of SGD iterations or terminates based on a dynamic convergence criterion. The convergence criterion is the relative change in the sum of squared residuals:

$\frac{S_{k-1} - S_k}{S_{k-1}} < \rho$

Operations

MultipleLinearRegression is a Predictor. As such, it supports the fit and predict operation.

Fit

MultipleLinearRegression is trained on a set of LabeledVector:

fit: DataSet[LabeledVector] => Unit

Predict

MultipleLinearRegression predicts for all subtypes of Vector the corresponding regression value:

predict[T <: Vector]: DataSet[T] => DataSet[(T, Double)]

Parameters

The multiple linear regression implementation can be controlled by the following parameters:

Parameters	Description
Iterations	The maximum number of iterations. (Default value: 10)
Stepsize	Initial step size for the gradient descent method. This value controls how far the gradient descent method moves in the opposite direction of the gradient. Tuning this parameter might be crucial to make it stable and to obtain a better performance. (Default value: 0.1)
ConvergenceThreshold	Threshold for relative change of the sum of squared residuals until the iteration is stopped. (Default value: None)
LearningRateMethod	Learning rate method used to calculate the effective learning rate for each iteration. See the list of supported learning rate methods. (Default value: LearningRateMethod.Default)

Examples

// Create multiple linear regression learner
val mlr = MultipleLinearRegression()
.setIterations(10)
.setStepsize(0.5)
.setConvergenceThreshold(0.001)

// Obtain training and testing data set
val trainingDS: DataSet[LabeledVector] = ...
val testingDS: DataSet[Vector] = ...

// Fit the linear model to the provided data
mlr.fit(trainingDS)

// Calculate the predictions for the test data
val predictions = mlr.predict(testingDS)