Class ALS

WithParameters, Estimator<ALS>, Predictor<ALS>

public class ALS
extends Object
implements Predictor<ALS>
Alternating least squares algorithm to calculate a matrix factorization.

Given a matrix R, ALS calculates two matricess U and V such that R ~~ U^TV. The unknown row dimension is given by the number of latent factors. Since matrix factorization is often used in the context of recommendation, we'll call the first matrix the user and the second matrix the item matrix. The ith column of the user matrix is u_i and the ith column of the item matrix is v_i. The matrix R is called the ratings matrix and (R)_{i,j} = r_{i,j}.

In order to find the user and item matrix, the following problem is solved:

argmin_{U,V} sum_(i,j\ with\ r_{i,j} != 0) (r_{i,j} - u_{i}^Tv_{j})^2 + lambda (sum_(i) n_{u_i} ||u_i||^2 + sum_(j) n_{v_j} ||v_j||^2)

with \lambda being the regularization factor, n_{u_i} being the number of items the user i has rated and n_{v_j} being the number of times the item j has been rated. This regularization scheme to avoid overfitting is called weighted-lambda-regularization. Details can be found in the work of Zhou et al..

By fixing one of the matrices U or V one obtains a quadratic form which can be solved. The solution of the modified problem is guaranteed to decrease the overall cost function. By applying this step alternately to the matrices U and V, we can iteratively improve the matrix factorization.

The matrix R is given in its sparse representation as a tuple of (i, j, r) where i is the row index, j is the column index and r is the matrix value at position (i,j).

static class  ALS.BlockedFactorization
static class  ALS.Factors
Latent factor model vector
static class  ALS.OutLinks
static class  ALS.Rating
Representation of a user-item rating
public scala.Option<scala.Tuple2<DataSet<ALS.Factors>,DataSet<ALS.Factors>>> factorsOption()
• setNumFactors

public ALS setNumFactors(int numFactors)
Sets the number of latent factors/row dimension of the latent model

• setLambda

public ALS setLambda(double lambda)
Sets the regularization coefficient lambda

• setIterations

public ALS setIterations(int iterations)
Sets the number of iterations of the ALS algorithm

• setBlocks

public ALS setBlocks(int blocks)
Sets the number of blocks into which the user and item matrix shall be partitioned

• setSeed

public ALS setSeed(long seed)
Sets the random seed for the initial item matrix initialization

• setTemporaryPath

public ALS setTemporaryPath(String temporaryPath)
Sets the temporary path into which intermediate results are written in order to increase performance.

• empiricalRisk

public DataSet<Object> empiricalRisk(DataSet<scala.Tuple3<Object,Object,Object>> labeledData,
ParameterMap riskParameters)
Empirical risk of the trained model (matrix factorization).

labeledData - Reference data
riskParameters - Additional parameters for the empirical risk calculation
