# Representation learning & recommender systems

In this practical session, we investigate two classical matrix-factorization models and their neural network implementation.


In [None]:
#! pip install torch torchvision pytorch-lightning --upgrade
#! pip install matplotlib --upgrade

In [2]:
import torch
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

## Data used : [smallest movie-lens dataset](https://grouplens.org/datasets/movielens/)

Let's start with a very common dataset describing users, movies & interactions (ratings):

![image reco](media/Facto-mat.png)

# 1)  Load & Prepare Data

To be able to embed the data easily, we need to remap  the user/items between [0->N_User] and [0->N_Items].

In [16]:
from random import shuffle

## Load
#ratings = pd.read_csv("data/ratings.csv")
ratings = pd.read_csv("data/ml-100k/u.data", sep="\t",dtype=int, names=["userId","movieId", "rating", "timestamp"])
ratings.astype({'rating': 'float'},copy=False)
ratings.head(5)


Unnamed: 0,userId,movieId,rating,timestamp
0,196,242,3,881250949
1,186,302,3,891717742
2,22,377,1,878887116
3,244,51,2,880606923
4,166,346,1,886397596


In [17]:

## Prepare Data
user_map = {user:num for num,user in enumerate(ratings["userId"].unique())}
item_map = {item:num for num,item in enumerate(ratings["movieId"].unique())}

## Number of users & items
num_users = len(user_map)
num_items = len(item_map)

ratings["userId"] = ratings["userId"].map(user_map)
ratings["movieId"] = ratings["movieId"].map(item_map)

ratings.head(5)


Unnamed: 0,userId,movieId,rating,timestamp
0,0,0,3,881250949
1,1,1,3,891717742
2,2,2,1,878887116
3,3,3,2,880606923
4,4,4,1,886397596


In [18]:

# Creating Test/Train as before

train_indexes,val_indexes,test_indexes = [],[],[]

for index in range(len(ratings)):
    if index%5 == 0: # 20% of the data
        test_indexes.append(index)
    else:
        train_indexes.append(index)

        
shuffle(train_indexes)
num_val = int(len(train_indexes)/100*20)
val_indexes = train_indexes[:num_val]
train_indexes = train_indexes[num_val:]

train_ratings = ratings.iloc[train_indexes].copy() # separate data
val_ratings = ratings.iloc[val_indexes].copy()
test_ratings = ratings.iloc[test_indexes].copy()


print(f" #train:{len(train_ratings)}, #val:{len(val_ratings)} ,#test:{len(test_ratings)}" )



 #train:64000, #val:16000 ,#test:20000


In [19]:
# USAGE
# In what follows, we will browse the tuple this way:
cpt = 0
for index, uid, mid, r, ts in train_ratings.itertuples():
    print(index,uid, mid,r) # remember that indexes were shuffled
    cpt+=1
    if cpt > 5:
        break

7048 35 1076 3
23124 318 229 5
76987 620 25 3
48022 507 1449 4
96463 940 483 2
59973 674 158 4


## Reproduce the baseline model with pytorch's vanilla autograd

Your goal now is to reproduce the following (strong) baseline model from surprise

 $$\hat{r}_{ui} = b_{ui} = \mu + b_u + b_i$$

[no matrix factorization here, only 3 scalars involved for a prediction $(u,i)$] <BR>
[Even $\mu$ could be computed from the train set, we are going to learn this parameter in the optimization process]

## First, let's define the parameters

You have many parameters, they are all 1-dimensional:
- **mu:** the global mean (1,)
- **bu:** the user means (n_users,)
- **bi:** the item means (n_items,)

In [20]:
mu = torch.tensor([3.5],requires_grad=True) # activate gradient to be able to learn something
bu = [torch.tensor([0.1],requires_grad=True) for _ in range(num_users)]
bi = [torch.tensor([0.1],requires_grad=True) for _ in range(num_items)]


Then, we define two functions: 

- `predict(u,i)` : Will return the prediction given the (user,item) pair
- `error(pred,real)` : Will return the MSE error of prediction

#### (TODO) Predict Function
This function should implement this: $\hat{r}_{ui} = b_{ui} = \mu + b_u + b_i$

In [21]:
def predict(u,i):
    # build a (simlple) prediction from the above mentioned parameters
    ##  TODO 

### (TODO) error function
We want to use the MSE

In [22]:
def error(pred,real):
    # define simple MSE
    ##  TODO 

#### The evaluation loop, without any optimization for now

In [23]:
train_e = 0
for index, uid, mid, r, ts in train_ratings.itertuples(): # elegant way to browse tuples (from pandas structure)
    result = predict(uid,mid)
    train_e += error(result,r).item()

# define the same command for validation, test
# display the errors    
# The 3 errors are likely to be close
##  TODO 

final train error :  1.297749952620943
final val error :  1.2980999510441906
final test error :  1.2890999528929592


## Let's optimize the parameters (with SGD)  by slightly modifying the previous loop

### (TODO)


In [24]:
# parameters' values
lr = 0.01
batch_size = 32
n_epochs = 5

for epoch in range(n_epochs):
    
    # loop on the training samples
    #   prediction
    #   error
    #   backward (accumulation)
    #   update
    #   zero_grad

    #  TODO 

    # Evalaution on the validation set + test set
    val_e = 0
    for index, uid, mid, r, ts in val_ratings.itertuples():
        result = predict(uid,mid)
        val_e += error(result,r).item()

    print(f"epoch {epoch} val error : ", val_e/len(val_ratings))

    test_e = 0
    for index, uid, mid, r, ts in test_ratings.itertuples():
        result = predict(uid,mid)
        test_e += error(result,r).item()

    print(f"epoch {epoch} test error : ", test_e/len(test_ratings))
    print("-----")

epoch 0 train error :  1.115951048684718
epoch 0 val error :  1.0075381769452092
epoch 0 test error :  1.0171096366554184
-----
epoch 1 train error :  0.983901562118671
epoch 1 val error :  0.9854860890130736
epoch 1 test error :  1.0000026043019827
-----
epoch 2 train error :  0.9378580559998576
epoch 2 val error :  0.9336627769944015
epoch 2 test error :  0.9482141569386194
-----
epoch 3 train error :  0.9197446114585628
epoch 3 val error :  0.9279990462217967
epoch 3 test error :  0.9425050851423508
-----
epoch 4 train error :  0.9064893474218316
epoch 4 val error :  0.9654338803511199
epoch 4 test error :  0.9749095367468712
-----


# Embedding module

To build a matrix of vectorial representations of dimension $Z$, for instance describing the users, we are going to use a new module called `embedding`:
$$ U = \begin{pmatrix}\mathbf u_1, \ldots, \mathbf u_n\end{pmatrix}, \mathbf u \in \mathbb R^Z $$ 

Call for a index, get a $Z$ dimensional representation:

In [25]:
latent_size = 10
nb_users = 100
users = torch.nn.Embedding(nb_users, latent_size) # random init

# get representation of user 5:
print("User 5:", users(torch.tensor(5))) # WARNING: call for a tensor (not an int)

# get representation of user 5 & 7:
print("User 5 & 7:", users(torch.tensor([5,7])))

User 5: tensor([-0.5676,  0.2476, -0.8242,  0.1961,  1.6378, -0.4224, -1.1606, -0.2568,
        -1.0188, -1.2670], grad_fn=<EmbeddingBackward0>)
User 5 & 7: tensor([[-0.5676,  0.2476, -0.8242,  0.1961,  1.6378, -0.4224, -1.1606, -0.2568,
         -1.0188, -1.2670],
        [-0.3021, -0.4939, -1.2852, -1.2867, -0.3640,  0.8740,  0.0207, -0.8172,
         -0.0129,  1.0974]], grad_fn=<EmbeddingBackward0>)


In [26]:
# Initialize the embedding with smaller values:

torch.nn.init.normal_(users.weight,0,0.01) # apply on the weights

# get representation of user 5:
print("User 5:", users(torch.tensor(5))) # WARNING: call for a tensor (not an int)


User 5: tensor([-0.0060, -0.0039,  0.0131,  0.0117, -0.0164, -0.0036,  0.0154,  0.0065,
         0.0116,  0.0018], grad_fn=<EmbeddingBackward0>)



##  Classic matrix factorisation (called SVD in RecSys) (with mean)

To see how it works, we propose to implement a simple SVD:
### $$ \min\limits_{U,I}\sum\limits_{(u,i)} \underbrace{(r_{ui} -  (I_i^TU_u + \mu))^2}_\text{minimization} + \underbrace{\lambda(||U_u||^2+||I_u||^2 + \mu) }_\text{regularization} $$

where prediction is done in the following way:
### $$r_{ui} = \mu + U_u.I_i $$

where $\mu$ is the global mean,  $U_u$ a user embedding and $I_i$ an item embedding

### STEPS:
 To implement such model in pytorch, we need to do multiple things:
 
 - (1) model definition
 - (2) loss function
 - (3) evaluation
 - (4) training/eval loop




#### (1) Model definition

A model class typically extends `nn.Module`, the Neural network module. It is a convenient way of encapsulating parameters, with helpers for moving them to GPU, exporting, loading, etc.

One should define two functions: `__init__` and `forward`.

- `__init__` is used to initialize the model parameters
- `forward` is the net transformation from input to output. In fact, when doing `moduleClass(input)` you call this method.

##### (a) Initialization

Our model has different weigths:

- the user profiles (also called user embeddings) $U$
- the item profiles (also called user embeddings) $I$
- the mean bias $\mu$


##### (b) input to output operation
Technically, the prediction as defined earlier can be seen as just a dot product between two embeddings $U_u$ and $I_i$ plus the mean rating:

- `torch.sum(embed_u*embed_i,1) + self.mean` is equivalent to $r_{ui} = \mu + U_u.I_i $ 
- the `.squeeze(1)` operation is a shape operation to remove the dimension 1 (indexing starts at 0) akin to reshaping the matrix from `(batch_size,1,latent_size)` to `(batch_size,latent_size)`
- for reference, the inverse operation is `.unsqueeze()`
- we return weights to regularize them


### (TODO) Just to make sure you were following: complete the following `__init__`and  `forward` methods

In [27]:


# The model define as a class, inheriting from nn.Module
class ClassicMF(torch.nn.Module):
    
    #(a) Init
    def __init__(self,nb_users,nb_items,latent_size):
        super(ClassicMF, self).__init__()
        # define the embeddings
        #   note: to define an attribute: self.users = ...
        # initialize with std = 0.01
        # define mu & initialize = 3

        #  TODO 
    
    # (b) How we compute the prediction (from input to output)
    def forward(self, user, item): ## method called when doing ClassicMF(user,item)
        # pay attention to the arguments: we have to give indexes
        # from the indexes, compute the output
        # WARNING: print self.users(user) once to understand which dimension to squeeze
        # WARNING (2): return the embeddings on top of the output to compute the regularization term => 4 outputs expected

        #  TODO 
    

#### (2-4) full train loop

The train loop is organized around the [Dataloader](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader) class which Combines a dataset and a sampler, and provides single- or multi-process iterators over the dataset.

We just redefine a collate function

> collate_fn (callable, optional) – merges a list of samples to form a mini-batch.


**NOTE:** The dataset argument can be a list instead of a "Dataset" instance (works by duck typing)
    

##### The train loop sequence is the following:
    
[Dataset ==Dataloader==> Batch (not prepared) ==collate_fn==> Batch (prepared) ==Model.forward==> Prediction =loss_fn=> loss <-> truth 

1] PREDICT
- (a) The dataloader samples training exemples from the dataset (which is a list)
- (b) The collate_fn prepares the minibatch of training exemples
- (c) The prediction is made by feeding the minibatch in the model
- (d) The loss is computed on the prediction via a loss function

2] OPTIMIZE
- (e) Gradients are computed by automatic backard propagation
- (f) Parameters are updated using computed gradients

In [28]:
#  Let's create the datasets following  (Object w/ __getitem__(index) and __len()__, i.e lists ;)
prep_train = [(tp.userId,tp.movieId,tp.rating) for tp in train_ratings.itertuples()]
prep_val   = [(tp.userId,tp.movieId,tp.rating) for tp in val_ratings.itertuples()]
prep_test  = [(tp.userId,tp.movieId,tp.rating) for tp in test_ratings.itertuples()]

In [29]:
a,b,c = zip(*prep_train[:10])
print(a, b, c)

(35, 318, 620, 507, 940, 674, 912, 428, 145, 14) (1076, 229, 25, 1449, 483, 158, 211, 1114, 356, 516) (3, 5, 3, 4, 2, 4, 4, 5, 4, 2)


In [30]:
from torch.utils.data import DataLoader
import torch.nn.functional as F


# HyperParameters
n_epochs = 3
batch_size = 16
num_feat = 25
lr = 0.01
reg = 0.001


#(b) Collate function => Creates tensor batches to feed model during training
# It can be removed if data is already tensors (torch or numpy ;)
def tuple_batch(l):
    '''
    input l: list of (user,item,rating tuples)
    output: formatted batches (in torch tensors)

    takes n-tuples and create batch
    text -> seq word #id
    '''
    users, items, ratings = zip(*l) 
    users_t = torch.LongTensor(users)
    items_t = torch.LongTensor(items)
    ratings_t = torch.FloatTensor(ratings)
    
    return users_t, items_t, ratings_t
    


#(d) Loss function => Combines MSE and L2
def loss_func(pred,ratings_t,reg,*params): # specific syntax (cf details in the next box)
    '''
    mse loss combined with l2 regularization.
    params assumed 2-dimension
    '''
    mse = F.mse_loss(pred,ratings_t,reduction='sum')
    l2 = 0
    for p in params: # ranging on all parameters
        l2 += torch.mean(p.norm(2,-1))
        
    return (mse/pred.size(0)) + reg*l2 , mse
    
#
# Training script starts here
#    

# (a) dataloader will sample data from datasets using collate_fn tuple_batch
dataloader_train = DataLoader(prep_train, batch_size=batch_size, shuffle=True, num_workers=0, collate_fn=tuple_batch)
dataloader_val = DataLoader(prep_val, batch_size=batch_size, shuffle=True, num_workers=0, collate_fn=tuple_batch)
dataloader_test = DataLoader(prep_test, batch_size=batch_size, shuffle=False, num_workers=0, collate_fn=tuple_batch)


In [31]:
# Define model & optimizer

model = ClassicMF(num_users,num_items,num_feat)
optimizer = torch.optim.Adam(model.parameters())

In [32]:
## INTERMEDIATE BOX for in depth understanding

# inference & parameter retrieving (if your forward is defined as expected)
users_t,items_t,ratings_t = next(iter(dataloader_train)) # retrieve first batch
# check dim
print(users_t.size()) # batch
print(users_t)

# output of the forward step:
pred, embed_u, embed_i, mu = model(users_t,items_t)
print(pred.size(), embed_u.size()) # batch
print(pred) # Current predictions for the batch

# alternative advanced syntax
pred, *params = model(users_t,items_t) # param is a list !!
print(len(params)) 
print(params[0].size()) # params[0] corresponds to embed_u

# idea: retrieving the list of parameter... And then transmit the list to loss_func without unpacking
print(loss_func(pred,ratings_t,reg,*params))    # yhat, y, lambda_reg, all_params
                                                # return mse + regul, mse (sum not the mean)

# we can apply backward on what we want...
                                                


torch.Size([16])
tensor([451, 869, 224, 545, 835, 837, 776, 608, 523, 100, 310, 758, 364,  92,
        688,  54])
torch.Size([16]) torch.Size([16, 25])
tensor([3.0001, 3.0002, 2.9993, 2.9998, 3.0002, 2.9998, 3.0006, 2.9993, 3.0001,
        3.0001, 2.9997, 2.9994, 2.9998, 2.9993, 2.9998, 2.9997],
       grad_fn=<AddBackward0>)
3
torch.Size([16, 25])
(tensor(1.6909, grad_fn=<AddBackward0>), tensor(27.0042, grad_fn=<MseLossBackward0>))


In [33]:
#
# Train loop (epoch)
#   loop over the dataloader
#       forward (+get the parameters)
#       loss
#       backward
#       optim
#   compute mse on validation & test
#   display losses for epoch e
#

## TODO 
    

-------------------------
epoch 0 mse (train/val/test) 1.194 / 1.024 / 1.019
-------------------------
epoch 1 mse (train/val/test) 0.91 / 0.91 / 0.904
-------------------------
epoch 2 mse (train/val/test) 0.774 / 0.876 / 0.874


## (Your turn from scratch) Koren 2009 model:

Here, this model simply adds a bias for each user and for each item

### $$ \min\limits_{U,I}\sum\limits_{(u,i)} \underbrace{(r_{ui} -  (I_i^TU_u + \mu+ \mu_i+\mu_u))^2}_\text{minimization} + \underbrace{\lambda(||U_u||^2+||I_u||^2 + \mu  + \mu+ \mu_i+\mu_u) }_\text{regularization} $$


### $$r_{ui} = \mu + \mu_i + \mu_u + U_u.I_i $$

### TODO:

- (a) complete the model initialization
- (b) complete the forward method

In [None]:

#  TODO 

### (TODO) Here, train loop stays the same, you only have to change the model

In [None]:
from torch.utils.data import DataLoader
import torch.nn.functional as F

n_epochs = 10
batch_size = 16
num_feat = 25
lr = 0.01
reg = 0.001

# note: previous loss function should be robust to the new model thanks to advanced syntax :)

model =  KorenMF(num_users,num_items,num_feat)
optimizer = torch.optim.Adam(model.parameters())

# same loop as before
#  TODO 
    

# [Optional part] How to complete this series of experiments

### Visualization

Use tsne to display embedding
* could be done with sklearn [link](https://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html)
* often done with tensorboard in deep applications


### Regularization

Exploit side informations to regularize the profiles:
* Users from the same age category are supposed to have closer representations, Movies from the same genre, etc...


In [40]:
# load side informations
uinfo = pd.read_csv("data/ml-100k/u.user", sep="|", names=["userId","age", "genre", "prof","zip"])
uinfo.head(5)

# WARNING: we changed the definition of ids => make ids consistent
uinfo["userId"].map(user_map) # using the same dictionary
genre_map = {g:num for num,g in enumerate(uinfo["genre"].unique())}
uinfo["genre"].map(genre_map) 
prof_map = {p:num for num,p in enumerate(uinfo["prof"].unique())}
uinfo["prof"].map(prof_map) 
# age cat

0       0
1       1
2       2
3       0
4       1
       ..
938     5
939     4
940     5
941    11
942     5
Name: prof, Length: 943, dtype: int64

# Construction du sujet à partir de la correction

In [1]:
###  TODO )"," TODO ",\
    txt, flags=re.DOTALL))
f2.close()

### </CORRECTION> ###