<a href="https://colab.research.google.com/github/tobias-hoepfl/Digital-Organizations-SE/blob/main/learning_portfolio/6_recommendation_system_theory_Hoepfl.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Learning Portfolio: Recommendation systems

**Theoretical understanding:**

Answers to selected questions on the fastAI chapter (https://www.kaggle.com/code/jhoward/collaborative-filtering-deep-dive/notebook)

<br>

**Practical understanding:**

small toy example following the youtube video on collaborative filtering; also see homework and code comments that I did in the corresponding assignment (assignment 7)


## Theoretical understanding

**What problem does collaborative filtering solve?**

Recommend specific items or products to a specific user.

<br>

**How does it solve it?**

It analyzes the behaviour of other users and based on the similarities to them it tries make predictions for users who have not yet rated a specific item or product.

This approach can be used by applying cosine similarity like was shown in the youtube-video recommended (https://www.youtube.com/watch?v=Fmtorg_dmM0&ab_channel=ritvikmath).

The approach used in the fastAI course works with the intuition that there are hidden (latent) features that determine what a user will most propably like.

<br>

**Why might a collaborative filtering predictive model fail to be a very useful recommendation system?**

- Sparsity of data
- I need a lot of data to be able to start (this might not be available for a lot of items)
- Some highly active users might introduce a bias
- Especially on less popular items there might be hardly any data

<br>

**What does a crosstab representation of collaborative filtering data look like?**

Like a matrix where the rows represent the users and the columns the items.

<br>

**What is a latent factor? Why is it "latent"?**

It is not directly visible in the data, but describes the underlying structure (e.g. a tendency to likes horror movie)

<br>

**What is an embedding matrix?**

Transforms the matrix described before in a look-up table for each item-user combination (denser than the sparse matrix)

<br>

**Why do we need Embedding if we could use one-hot-encoded vectors for the same thing?**

One-hot-encoding would transform every level to its own column, which we don't want. Therefore we use an embedding which can also display continuous features

<br>

**What does an embedding contain before we start training (assuming we're not using a pretrained model)?**

It contains random initializations for the factors.

<br>

**What is the use of bias in a dot product model?**

It helps us to account for the fact that some users inherently tend to give higher ratings or some items are inherently more popular.

<br>

**What is another name for weight decay?**

L2-regularization

Helps to prevent overfitting

<br>

**Write the equation for weight decay.**

loss_with_wd = loss + wd * (parameters**2).sum()

The normal loss is incremented by a "punishing factor" for the size of the parameters.

<br>

**What is the "bootstrapping problem" in collaborative filtering?**

Give recommendations to new users or for new items

<br>

**How could you deal with the bootstrapping problem for new users? For new movies?**

- Just give them the recommendation for an average user (e.g. overall popular movies)
- Use other data (demographics, etc.) if available (is called metadata)

<br>

**What kind of model should we use if we want to add metadata about users and items, or information such as date and time, to a collaborative filtering model?**

e.g. use a hybrid recommender system that combines collaborative filtering with content-based filtering or other methods

## Practical understanding

I built a small toy example using the same intuition as in the Youtube video (https://www.youtube.com/watch?v=Fmtorg_dmM0) provided (but different numbers of course).

We consider to have a matrix of three users and five products. The rating is given for some user-product-combinations, but not for all. Rating can be between 1 and 9.

The goal is to find out, what to recommend to user 3 next.

In [None]:
import torch

user_product_matrix = torch.zeros(3, 5)

#user 1
user_product_matrix[0,0] = 1
user_product_matrix[0,1] = 2
user_product_matrix[0,2] = 0
user_product_matrix[0,3] = 8
user_product_matrix[0,4] = 9

#user 2
user_product_matrix[1,0] = 8
user_product_matrix[1,1] = 9
user_product_matrix[1,2] = 8
user_product_matrix[1,3] = 2
user_product_matrix[1,4] = 1

#user 3
user_product_matrix[2,0] = 0
user_product_matrix[2,1] = 8
user_product_matrix[2,2] = 7
user_product_matrix[2,3] = 1
user_product_matrix[2,4] = 0

print(user_product_matrix)

tensor([[1., 2., 0., 8., 9.],
        [8., 9., 8., 2., 1.],
        [0., 8., 7., 1., 0.]])


Observation:

- User 1 and user 2 are very different: User 1 likes product 1 and 2, while user 2 does not like them. User 2 likes product 4 and 5, while user 1 does not
- User 3 likes product 2 and dislikes product 4. Based on this information, he is more similar to user 1 than to user 2.
- Intuitively, therefore product 1 should be recommended next to user 3 (because it is also liked by the similar user 1, and disliked by the different user 2)

In [None]:
#As a measure of similarity we use cosine similarity

from sklearn.metrics.pairwise import cosine_similarity

#reshaping is necessary, because otherwise cosine_similarity does not work
ratings_user_1 = user_product_matrix[0, :].reshape(1, -1)
ratings_user_2 = user_product_matrix[1, :].reshape(1, -1)
ratings_user_3 = user_product_matrix[2, :].reshape(1, -1)

#Now calculate cosine similarity
#We are only interested in user 3
#Therefore we calculate distance between user 3 to each of the other users

ratings_user_3_for_1 = (ratings_user_3[torch.logical_and(ratings_user_3 != 0, ratings_user_1 != 0)].reshape(1, -1))
ratings_user_1_for_3 = (ratings_user_1[torch.logical_and(ratings_user_3 != 0, ratings_user_1 != 0)].reshape(1, -1))
ratings_user_3_for_2 = (ratings_user_3[torch.logical_and(ratings_user_3 != 0, ratings_user_2 != 0)].reshape(1, -1))
ratings_user_2_for_3 = (ratings_user_2[torch.logical_and(ratings_user_3 != 0, ratings_user_2 != 0)].reshape(1, -1))

cos_sim_3_1 = cosine_similarity(ratings_user_3_for_1, ratings_user_1_for_3)
cos_sim_3_2 = cosine_similarity(ratings_user_3_for_2, ratings_user_2_for_3)

print('Similarity between user 3 and user 1:', cos_sim_3_1)
print('Similarity between user 3 and user 2:', cos_sim_3_2)

Similarity between user 3 and user 1: [[0.3609941]]
Similarity between user 3 and user 2: [[0.9974653]]


As expected, similarity for user 3 is much higher to user 2

In [None]:
rec_score_prod_1 = (cos_sim_3_1 * ratings_user_1[0][0].item() + cos_sim_3_2 * ratings_user_2[0][0].item())/(cos_sim_3_1 + cos_sim_3_2)
rec_score_prod_5 = (cos_sim_3_1 * ratings_user_1[0][4].item() + cos_sim_3_2 * ratings_user_2[0][4].item())/(cos_sim_3_1 + cos_sim_3_2)

print('Product 1:', rec_score_prod_1)
print('Product 5:', rec_score_prod_5)

Product 1: [[6.1398344]]
Product 5: [[3.125903]]


As expected, the recomended score for user 3 is higher for product 1 than for product 5. Therefore, next we will recommend product 1 to him.