Multiple linear regression tries to find a linear function which best fits the provided input data. Given a set of input data with its value $(\mathbf{x}, y)$, multiple linear regression finds a vector $\mathbf{w}$ such that the sum of the squared residuals is minimized:
Written in matrix notation, we obtain the following formulation:
This problem has a closed form solution which is given by:
However, in cases where the input data set is so huge that a complete parse over the whole data set is prohibitive, one can apply stochastic gradient descent (SGD) to approximate the solution. SGD first calculates for a random subset of the input data set the gradients. The gradient for a given point $\mathbf{x}_i$ is given by:
The gradients are averaged and scaled. The scaling is defined by $\gamma = \frac{s}{\sqrt{j}}$ with $s$ being the initial step size and $j$ being the current iteration number. The resulting gradient is subtracted from the current weight vector giving the new weight vector for the next iteration:
The multiple linear regression algorithm computes either a fixed number of SGD iterations or terminates based on a dynamic convergence criterion. The convergence criterion is the relative change in the sum of squared residuals:
MultipleLinearRegression
is a Predictor
.
As such, it supports the fit
and predict
operation.
MultipleLinearRegression is trained on a set of LabeledVector
:
fit: DataSet[LabeledVector] => Unit
MultipleLinearRegression predicts for all subtypes of Vector
the corresponding regression value:
predict[T <: Vector]: DataSet[T] => DataSet[(T, Double)]
The multiple linear regression implementation can be controlled by the following parameters:
Parameters | Description |
---|---|
Iterations |
The maximum number of iterations. (Default value: 10) |
Stepsize |
Initial step size for the gradient descent method. This value controls how far the gradient descent method moves in the opposite direction of the gradient. Tuning this parameter might be crucial to make it stable and to obtain a better performance. (Default value: 0.1) |
ConvergenceThreshold |
Threshold for relative change of the sum of squared residuals until the iteration is stopped. (Default value: None) |
LearningRateMethod |
Learning rate method used to calculate the effective learning rate for each iteration. See the list of supported learning rate methods. (Default value: LearningRateMethod.Default) |