## Class ALS

• All Implemented Interfaces:
WithParameters, Estimator<ALS>, Predictor<ALS>

public class ALS
extends Object
implements Predictor<ALS>
Alternating least squares algorithm to calculate a matrix factorization.

Given a matrix R, ALS calculates two matricess U and V such that R ~~ U^TV. The unknown row dimension is given by the number of latent factors. Since matrix factorization is often used in the context of recommendation, we'll call the first matrix the user and the second matrix the item matrix. The ith column of the user matrix is u_i and the ith column of the item matrix is v_i. The matrix R is called the ratings matrix and (R)_{i,j} = r_{i,j}.

In order to find the user and item matrix, the following problem is solved:

argmin_{U,V} sum_(i,j\ with\ r_{i,j} != 0) (r_{i,j} - u_{i}^Tv_{j})^2 + lambda (sum_(i) n_{u_i} ||u_i||^2 + sum_(j) n_{v_j} ||v_j||^2)

with \lambda being the regularization factor, n_{u_i} being the number of items the user i has rated and n_{v_j} being the number of times the item j has been rated. This regularization scheme to avoid overfitting is called weighted-lambda-regularization. Details can be found in the work of Zhou et al..

By fixing one of the matrices U or V one obtains a quadratic form which can be solved. The solution of the modified problem is guaranteed to decrease the overall cost function. By applying this step alternately to the matrices U and V, we can iteratively improve the matrix factorization.

The matrix R is given in its sparse representation as a tuple of (i, j, r) where i is the row index, j is the column index and r is the matrix value at position (i,j).

• ### Nested Class Summary

Nested Classes
Modifier and Type Class and Description
static class  ALS.BlockedFactorization
static class  ALS.BlockedFactorization$ static class  ALS.BlockIDGenerator static class  ALS.BlockIDPartitioner static class  ALS.BlockRating static class  ALS.BlockRating$
static class  ALS.Blocks$ static class  ALS.Factorization static class  ALS.Factorization$
static class  ALS.Factors
Latent factor model vector
static class  ALS.Factors$ static class  ALS.InBlockInformation static class  ALS.InBlockInformation$
static class  ALS.Iterations$ static class  ALS.Lambda$
static class  ALS.NumFactors$ static class  ALS.OutBlockInformation static class  ALS.OutBlockInformation$
static class  ALS.OutLinks
static class  ALS.Rating
Representation of a user-item rating
static class  ALS.Rating$ static class  ALS.Seed$
static class  ALS.TemporaryPath\$
• ### Constructor Summary

Constructors
Constructor and Description
ALS()
• ### Method Summary

All Methods
Modifier and Type Method and Description
static ALS apply()
static scala.Tuple2<DataSet<scala.Tuple2<Object,ALS.InBlockInformation>>,DataSet<scala.Tuple2<Object,ALS.OutBlockInformation>>> createBlockInformation(int userBlocks, int itemBlocks, DataSet<scala.Tuple2<Object,ALS.Rating>> ratings, ALS.BlockIDPartitioner blockIDPartitioner)
Creates the meta information needed to route the item and user vectors to the respective user and item blocks.
static DataSet<scala.Tuple2<Object,ALS.InBlockInformation>> createInBlockInformation(DataSet<scala.Tuple2<Object,ALS.Rating>> ratings, DataSet<scala.Tuple2<Object,int[]>> usersPerBlock, ALS.BlockIDGenerator blockIDGenerator)
Creates the incoming block information
static DataSet<scala.Tuple2<Object,ALS.OutBlockInformation>> createOutBlockInformation(DataSet<scala.Tuple2<Object,ALS.Rating>> ratings, DataSet<scala.Tuple2<Object,int[]>> usersPerBlock, int itemBlocks, ALS.BlockIDGenerator blockIDGenerator)
Creates the outgoing block information
static DataSet<scala.Tuple2<Object,int[]>> createUsersPerBlock(DataSet<scala.Tuple2<Object,ALS.Rating>> ratings)
Calculates the userIDs in ascending order of each user block
DataSet<Object> empiricalRisk(DataSet<scala.Tuple3<Object,Object,Object>> labeledData, ParameterMap riskParameters)
Empirical risk of the trained model (matrix factorization).
scala.Option<scala.Tuple2<DataSet<ALS.Factors>,DataSet<ALS.Factors>>> factorsOption()
static Object fitALS()
Calculates the matrix factorization for the given ratings.
static void generateFullMatrix(double[] triangularMatrix, double[] fullMatrix, int factors)
static DataSet<ALS.Factors> generateRandomMatrix(DataSet<Object> users, int factors, long seed)
static String ITEM_FACTORS_FILE()
static void outerProduct(double[] vector, double[] matrix, int factors)
static Object predictRating()
Predict operation which calculates the matrix entry for the given indices
static double[] randomFactors(int factors, scala.util.Random random)
ALS setBlocks(int blocks)
Sets the number of blocks into which the user and item matrix shall be partitioned
ALS setIterations(int iterations)
Sets the number of iterations of the ALS algorithm
ALS setLambda(double lambda)
Sets the regularization coefficient lambda
ALS setNumFactors(int numFactors)
Sets the number of latent factors/row dimension of the latent model
ALS setSeed(long seed)
Sets the random seed for the initial item matrix initialization
ALS setTemporaryPath(String temporaryPath)
Sets the temporary path into which intermediate results are written in order to increase performance.
static DataSet<ALS.Factors> unblock(DataSet<scala.Tuple2<Object,double[][]>> users, DataSet<scala.Tuple2<Object,ALS.OutBlockInformation>> outInfo, ALS.BlockIDPartitioner blockIDPartitioner)
Unblocks the blocked user and item matrix representation so that it is at DataSet of column vectors.
static DataSet<scala.Tuple2<Object,double[][]>> updateFactors(int numUserBlocks, DataSet<scala.Tuple2<Object,double[][]>> items, DataSet<scala.Tuple2<Object,ALS.OutBlockInformation>> itemOut, DataSet<scala.Tuple2<Object,ALS.InBlockInformation>> userIn, int factors, double lambda, Partitioner<Object> blockIDPartitioner)
Calculates a single half step of the ALS optimization.
static String USER_FACTORS_FILE()
• ### Methods inherited from class java.lang.Object

clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
• ### Methods inherited from interface org.apache.flink.ml.pipeline.Predictor

evaluate, predict
• ### Methods inherited from interface org.apache.flink.ml.pipeline.Estimator

fit
• ### Methods inherited from interface org.apache.flink.ml.common.WithParameters

parameters
• ### Constructor Detail

• #### ALS

public ALS()
• ### Method Detail

• #### USER_FACTORS_FILE

public static String USER_FACTORS_FILE()
• #### ITEM_FACTORS_FILE

public static String ITEM_FACTORS_FILE()
• #### apply

public static ALS apply()
• #### predictRating

public static Object predictRating()
Predict operation which calculates the matrix entry for the given indices
• #### fitALS

public static Object fitALS()
Calculates the matrix factorization for the given ratings. A rating is defined as a tuple of user ID, item ID and the corresponding rating.

Returns:
Factorization containing the user and item matrix
• #### updateFactors

public static DataSet<scala.Tuple2<Object,double[][]>> updateFactors(int numUserBlocks,
DataSet<scala.Tuple2<Object,double[][]>> items,
DataSet<scala.Tuple2<Object,ALS.OutBlockInformation>> itemOut,
DataSet<scala.Tuple2<Object,ALS.InBlockInformation>> userIn,
int factors,
double lambda,
Partitioner<Object> blockIDPartitioner)
Calculates a single half step of the ALS optimization. The result is the new value for either the user or item matrix, depending with which matrix the method was called.

Parameters:
numUserBlocks - Number of blocks in the respective dimension
items - Fixed matrix value for the half step
itemOut - Out information to know where to send the vectors
userIn - In information for the cogroup step
factors - Number of latent factors
lambda - Regularization constant
blockIDPartitioner - Custom Flink partitioner
Returns:
New value for the optimized matrix (either user or item)
• #### createBlockInformation

public static scala.Tuple2<DataSet<scala.Tuple2<Object,ALS.InBlockInformation>>,DataSet<scala.Tuple2<Object,ALS.OutBlockInformation>>> createBlockInformation(int userBlocks,
int itemBlocks,
DataSet<scala.Tuple2<Object,ALS.Rating>> ratings,
ALS.BlockIDPartitioner blockIDPartitioner)
Creates the meta information needed to route the item and user vectors to the respective user and item blocks. * @param userBlocks
Parameters:
itemBlocks -
ratings -
blockIDPartitioner -
Returns:
• #### createUsersPerBlock

public static DataSet<scala.Tuple2<Object,int[]>> createUsersPerBlock(DataSet<scala.Tuple2<Object,ALS.Rating>> ratings)
Calculates the userIDs in ascending order of each user block

Parameters:
ratings -
Returns:
• #### createOutBlockInformation

public static DataSet<scala.Tuple2<Object,ALS.OutBlockInformation>> createOutBlockInformation(DataSet<scala.Tuple2<Object,ALS.Rating>> ratings,
DataSet<scala.Tuple2<Object,int[]>> usersPerBlock,
int itemBlocks,
ALS.BlockIDGenerator blockIDGenerator)
Creates the outgoing block information

Creates for every user block the outgoing block information. The out block information contains for every item block a BitSet which indicates which user vector has to be sent to this block. If a vector v has to be sent to a block b, then bitsets(b)'s bit v is set to 1, otherwise 0. Additionally the user IDataSet are replaced by the user vector's index value.

Parameters:
ratings -
usersPerBlock -
itemBlocks -
blockIDGenerator -
Returns:
• #### createInBlockInformation

public static DataSet<scala.Tuple2<Object,ALS.InBlockInformation>> createInBlockInformation(DataSet<scala.Tuple2<Object,ALS.Rating>> ratings,
DataSet<scala.Tuple2<Object,int[]>> usersPerBlock,
ALS.BlockIDGenerator blockIDGenerator)
Creates the incoming block information

Creates for every user block the incoming block information. The incoming block information contains the userIDs of the users in the respective block and for every item block a BlockRating instance. The BlockRating instance describes for every incoming set of item vectors of an item block, which user rated these items and what the rating was. For that purpose it contains for every incoming item vector a tuple of an id array us and a rating array rs. The array us contains the indices of the users having rated the respective item vector with the ratings in rs.

Parameters:
ratings -
usersPerBlock -
blockIDGenerator -
Returns:
• #### unblock

public static DataSet<ALS.Factors> unblock(DataSet<scala.Tuple2<Object,double[][]>> users,
DataSet<scala.Tuple2<Object,ALS.OutBlockInformation>> outInfo,
ALS.BlockIDPartitioner blockIDPartitioner)
Unblocks the blocked user and item matrix representation so that it is at DataSet of column vectors.

Parameters:
users -
outInfo -
blockIDPartitioner -
Returns:
• #### outerProduct

public static void outerProduct(double[] vector,
double[] matrix,
int factors)
• #### generateFullMatrix

public static void generateFullMatrix(double[] triangularMatrix,
double[] fullMatrix,
int factors)
• #### generateRandomMatrix

public static DataSet<ALS.Factors> generateRandomMatrix(DataSet<Object> users,
int factors,
long seed)
• #### randomFactors

public static double[] randomFactors(int factors,
scala.util.Random random)
• #### factorsOption

public scala.Option<scala.Tuple2<DataSet<ALS.Factors>,DataSet<ALS.Factors>>> factorsOption()
• #### setNumFactors

public ALS setNumFactors(int numFactors)
Sets the number of latent factors/row dimension of the latent model

Parameters:
numFactors -
Returns:
• #### setLambda

public ALS setLambda(double lambda)
Sets the regularization coefficient lambda

Parameters:
lambda -
Returns:
• #### setIterations

public ALS setIterations(int iterations)
Sets the number of iterations of the ALS algorithm

Parameters:
iterations -
Returns:
• #### setBlocks

public ALS setBlocks(int blocks)
Sets the number of blocks into which the user and item matrix shall be partitioned

Parameters:
blocks -
Returns:
• #### setSeed

public ALS setSeed(long seed)
Sets the random seed for the initial item matrix initialization

Parameters:
seed -
Returns:
• #### setTemporaryPath

public ALS setTemporaryPath(String temporaryPath)
Sets the temporary path into which intermediate results are written in order to increase performance.

Parameters:
temporaryPath -
Returns:
• #### empiricalRisk

public DataSet<Object> empiricalRisk(DataSet<scala.Tuple3<Object,Object,Object>> labeledData,
ParameterMap riskParameters)
Empirical risk of the trained model (matrix factorization).

Parameters:
labeledData - Reference data
riskParameters - Additional parameters for the empirical risk calculation
Returns: