Summary of Recommender Systems (Alternate Least Square, LightFM, Matrix Factorization with Neural Networks, and Neural Collaborative Filtering)
On social media networks, there is a wealth of semi-structured data. This task’s dataset was gathered from Flickr, an online photo-sharing social media network. Flickr allows users to share photos and communicate with one another (friends). The goal is to recommend a list of objects (pictures) to each user accessing this social media platform’s extensive data. The training dataset contains a set of interactions between users and items (photos) used toward building the recommendation system, and the validation data containing the ground truth of the rating is utilized in deciding the final model. The remaining datasets, except for test data, are not used for analysis.
Explicit feedbacks are ratings that are ranked on a specific scale. The majority of the scales range from 1 to 5. Implicit feedback is collected in different forms, e.g., browsing, screen time, clicks, reactions, view, etc., whereas explicit feedbacks are ratings. When users rate one thing higher than the other, it expresses their preference. On the other hand, implicit feedback is implied by the user’s behavior. “If a user clicks/views/spend time/reacts on an item frequently, it is a sign of his/her interest,” which is the underlying idea of implicit feedback.
Explicit feedback is simple because a user’s evaluations may be easily understood as the user’s preferences, making the system’s prediction more relevant. It is, however, far less common than implicit feedback because it is inconvenient for users to score everything they interact with (which they rarely do). In huge quantities, implicit input is easy to collect. In this case, the rating column can be considered implicit feedback since it represents whether the user interacted with the item. The training data consists only of 1s, indicating that all possible user and item combinations had an interaction. Hence, we will use an Implicit-based recommendation system as the initial model.
2.1 Matrix Factorization
Matrix factorization breaks down a large matrix into two smaller matrices whose product equals the original one. Alternating Least Squares is a type of matrix factorization that reduces the dimensions of this user-item matrix to a considerably smaller number of latent or hidden properties. And it does so in a very efficient computational manner.
2.1.1 Alternate Least Square (ALS)
The alternate Least Square method is an iterative optimization process where each iteration aims at creating a good factorized representation of the original data. Let’s consider a Matrix R of size u x i, where u represents the user, and i represent an item. The idea is to generate a Matrix U represented as u x f where f represents hidden features, and a Matrix V represented as f x i. Matrices U and V consist of weights of how the user and item relate to each feature. Hence as discussed above, the idea is to calculate U and V such that (Victor, 2018).
Values in U and V are randomly assigned using the least square, and over several iterations, the most optimized weights producing the best approximation of R are achieved. With Alternate Least Square, a similar idea is used, but the iteration optimizes U keeping V fixed and vice versa. The solution is to merge a preference p for an item with confidence c where the preference of user u to item i is defined as below where 𝑟ᵤᵢ— the unobserved value by user 𝑢 for item 𝑖 (Hu et al. 2008).
The beliefs are associated with varying confidence levels. 0 values of 𝑝ᵤᵢ are associated with low confidence. A user not taking a positive action from an item can be attributed to multiple factors other than them not liking it. Similarly, if an item is consumed, it doesn’t necessarily indicate a user’s preference for it.
2.1.2 Model Evaluation Metric
Normalized Discounted Cumulative Gain (NDCG) — Measure the quality of recommendation using the following assertions (Mhaskar, 2015):
- Most relevant results are more useful than somewhat relevant results and the result of ranking is independent of normalization
- Relevant results are more useful if they appear higher in the recommended list
Discounted cumulative gain (DCG) — Relevant documents appearing lower in the recommendation list are penalized as the graded relevance value is reduced logarithmically proportional to the position of the result (Mhaskar, 2015).
Validation Items Classified correctly in Top 15 Recommendation — The idea of the validation data is to check if the top 15 items recommended by the ALS model (trained on the training data) when using the user and item id are from validation data contain the item which the user interacted with, i.e., ground truth from validation data. In Validation data, every user has at least one interaction with an item. So, for every user-id, we take an intersection of the recommended item ids and the item ids with ground truth = 1 and then count for how many users the intersection is not 0. We can consider a model with a higher number of correctly predicted interactions from the validation data as the best model.
2.1.3 Hyper Parameter Tuning and Output from ALS
Different combinations of factors (5, 10, 12, 15, 20) regularization (0.1), and alpha (10,15, 20, and 25) values were utilized in training the model and then tested on validation data to analyze NDCG and correct Item’s classification. The results are provided in Figure 1 and Figure 2 below. Twenty-four different combinations of hyperparameters were tested (e.g., x-axis in below graph represents factors — regularization — alpha).
Light FM is an open-source recommender system for both implicit and explicit feedback. Embeddings created using LightFM can encode useful semantic information about features, which can be used for recommendation tasks. The structure of the LightFM model takes into consideration the following factors:
The model is capable of learning user and item representations using interaction data. If users like more than one item, e.g., chips, and coke, the model must learn that these two items are similar. Model scan compute recommendations for new users and items (Maciej & Lyst, 2015)
Four loss functions are available for LightFM; Logistic Loss Function, BPR, or Bayesian Personalised Ranking pairwise loss. WARP or Weighted Approximate-Rank Pairwise loss and k-OS WARP or kth order statistic loss.
The output of the LightFM model on training data is given in Table 1 below (Kula, 2016).
2.3 Matrix Factorization with Neural Networks
Matrix factorization portrays the user and item as vectors of latent features. These features are projected into a shared feature space and are inferred from item rating patterns. If the correspondence between the item and user is high, then a vector qᵢ ε Rᶠ is considered. For a given item i, the elements of qᵢ measure how good and to what extent an item uses those factors. If we consider a user u, then pᵤ measures the extent of the user’s interest in items that are high on corresponding factors. The dot product of qᵢᵗ.pᵤ captures user and item interaction — a user’s overall interest in the item’s characteristics. User u’s rating of item i is denoted by rᵤᵢ, which equals qᵢᵗ.pᵤ. To learn the factor vectors (pᵤ & qᵢ), the regularized squared error on the set of known ratings is minimized (Koren et al., 2009).
2.4 Neural Collaborative Filtering
Neural Collaborative Filtering (NCF) captures the interaction between the user and items. NCF uses the multi-layer model to learn the interaction function of the user and item. The input consists of a sparse feature vector for user u (vᵤ) and item i (vᵢ), and the output is represented as (yᵤᵢ)=f(pᵤ,qᵢ). The input vector can include categorical variables such as attributes or context other than just the user or item. The Embedding layer consists of user and item latent vectors. The Neural Collaborative Filter Layer Consists of multiple hidden layers in which the User and Item Latent Vectors are concatenated. This follows a Multilayer Perceptron Layer with ReLU as an activation function and, finally, the Output Layer. Multilayer Perceptron uses nonlinear functions to learn interaction in which latent factors are not independent, and the interaction function is learned using the data having a better representation ability.
For Neural Network Models using Pytorch different values of weights, learning rates, epochs, and weight decay were utilized to check for improvement in the model’s performance.
The below table (Table 2) provides a synopsis of different models and their recommendation accuracy.
The following observations can be concluded from the analysis:
- Implicit ALS model with hyperparameters factors, alpha, regularization as 10, 20, and 0.1 produces the best NDCG for Validation and Test Data
- The output is considered poor for Matrix Factorization (MF) using Neural Networks (NN) and Neural Collaborative Filtering
- NN models are susceptible to initial weights, regularization, learning rates, and weight decays and requires more fine-tuning to obtain better output
- Adding bias to MF using NN improves model performance. This can be attributed to the better generalization ability of the model allowing it to capture observed signals. MF with bias can benefit from insights into user behavior
- Choice of inner product function for MF can limit the expressiveness of models and higher latent factors can reduce generalization capabilities
- A study conducted by He et al., 2017, suggests that deeper network models can produce good performance if more nonlinear layers are stacked. However, optimization difficulties can diminish improvement when too many layers are added
- Neural Collaborative Filter model observes degrade in performance because of the stacking of linear layers
The implicit ALS model with the highest NDCG score on validation was retrained on Training and Validation data together. User and interacted Items were retained from validation data were added back to the training data. Then the ALS model using the parameters mentioned above (Implicit ALS — factor = 10, reg = 0.1, alpha = 20) was used to produce the final recommendation. More data leads to better performance of the test set.
1. Koren, Y., Bell, R., & Volinsky, C. (2009). Matrix Factorization Techniques for Recommender Systems. Computer, 42(8), 30–37. https://doi.org/10.1109/mc.2009.263
2. Hu, Y., Koren, Y., & Volinsky, C. (2008). Collaborative Filtering for Implicit Feedback Datasets. 2008 Eighth IEEE International Conference on Data Mining. https://doi.org/10.1109/icdm.2008.22
3. Victor. (2018, July 10). ALS Implicit Collaborative Filtering. Medium. https://medium.com/radon-dev/als-implicit-collaborative-filtering-5ed653ba39fe
4. Kula, M. (2016). Model evaluation — LightFM 1.15 documentation. Making.lyst.com. https: // making.lyst.com /lightfm/docs/lightfm.evaluation.html
5. Maciej, K., & Lyst. (2015). Metadata Embeddings for User and Item Cold-start Recommendations. https://arxiv.org/pdf/1507.08439.pdf
6. He, X., Liao, L., Zhang, H., Nie, L., Hu, X., & Chua, T.-S. (2017). Neural Collaborative Filtering. ArXiv:1708.05031 [Cs]. https://arxiv.org/abs/1708.05031
Exploring Recommendation Systems: Review of Matrix Factorization & Deep Learning Models Republished from Source https://towardsdatascience.com/exploring-recommendation-systems-review-of-matrix-factorization-deep-learning-models-74d51a3b4f20?source=rss—-7f60cf5620c9—4 via https://towardsdatascience.com/feed