# Matrix factorization with PyTorch

In this notebook we will write a matrix factorization model in pytorch to solve a recommendation problem. 

The MovieLens dataset (ml-latest-small) describes 5-star rating and free-text tagging activity from MovieLens, a movie recommendation service. It contains 100004 ratings and 1296 tag applications across 9125 movies. https://grouplens.org/datasets/movielens/. To get the data:

`wget http://files.grouplens.org/datasets/movielens/ml-latest-small.zip`

## MovieLens dataset

In [1]:
from pathlib import Path
import pandas as pd
import numpy as np

In [2]:
PATH = Path("ml-latest-small")
list(PATH.iterdir())

[PosixPath('ml-latest-small/links.csv'),
 PosixPath('ml-latest-small/tags.csv'),
 PosixPath('ml-latest-small/ratings.csv'),
 PosixPath('ml-latest-small/README.txt'),
 PosixPath('ml-latest-small/movies.csv')]

In [3]:
! head ml-latest-small/ratings.csv

userId,movieId,rating,timestamp
1,1,4.0,964982703
1,3,4.0,964981247
1,6,4.0,964982224
1,47,5.0,964983815
1,50,5.0,964982931
1,70,3.0,964982400
1,101,5.0,964980868
1,110,4.0,964982176
1,151,5.0,964984041


In [4]:
# reading a csv into pandas
data = pd.read_csv(PATH/"ratings.csv")

In [5]:
data.head()

Unnamed: 0,userId,movieId,rating,timestamp
0,1,1,4.0,964982703
1,1,3,4.0,964981247
2,1,6,4.0,964982224
3,1,47,5.0,964983815
4,1,50,5.0,964982931


### Encoding data
We enconde the data to have contiguous ids for users and movies. You can think about this as a categorical encoding of our two categorical variables userId and movieId.

In [6]:
time_80 = np.quantile(data.timestamp.values, 0.8)
time_80

1458635171.0

In [7]:
train = data[data["timestamp"] < time_80].copy()
val = data[data["timestamp"] >= time_80].copy()

In [8]:
val.head()

Unnamed: 0,userId,movieId,rating,timestamp
1434,15,1,2.5,1510577970
1436,15,47,3.5,1510571970
1440,15,260,5.0,1510571946
1441,15,293,3.0,1510571962
1442,15,296,4.0,1510571877


In [9]:
# encoding movies and user ids with continous ids

train_user_ids = np.sort(np.unique(train.userId.values))
train_user_ids[:15]

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15])

In [10]:
# number of unique ids
num_users = len(train_user_ids)
num_users

522

In [11]:
userid2idx = {o:i for i,o in enumerate(train_user_ids)}
#userid2idx

In [12]:
train["userId"] = train["userId"].apply(lambda x: userid2idx[x])
train.head()

Unnamed: 0,userId,movieId,rating,timestamp
0,0,1,4.0,964982703
1,0,3,4.0,964981247
2,0,6,4.0,964982224
3,0,47,5.0,964983815
4,0,50,5.0,964982931


In [13]:
val["userId"] = val["userId"].apply(lambda x: userid2idx.get(x, -1)) # -1 for users not in training
val.head()

Unnamed: 0,userId,movieId,rating,timestamp
1434,14,1,2.5,1510577970
1436,14,47,3.5,1510571970
1440,14,260,5.0,1510571946
1441,14,293,3.0,1510571962
1442,14,296,4.0,1510571877


In [14]:
val = val[val["userId"] >= 0].copy()
val.head()

Unnamed: 0,userId,movieId,rating,timestamp
1434,14,1,2.5,1510577970
1436,14,47,3.5,1510571970
1440,14,260,5.0,1510571946
1441,14,293,3.0,1510571962
1442,14,296,4.0,1510571877


In [15]:
# now encoding movieId
train_movie_ids = np.sort(np.unique(train.movieId.values))
num_items = len(train_movie_ids)
print(num_items)
train_movie_ids[:15]

7867


array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15])

In [16]:
movieid2idx = {o:i for i,o in enumerate(train_movie_ids)}
train["movieId"] = train["movieId"].apply(lambda x: movieid2idx[x])
val["movieId"] = val["movieId"].apply(lambda x: movieid2idx.get(x, -1))

In [17]:
val = val[val["movieId"] >= 0].copy()
val.head()

Unnamed: 0,userId,movieId,rating,timestamp
1434,14,0,2.5,1510577970
1436,14,43,3.5,1510571970
1440,14,224,5.0,1510571946
1441,14,254,3.0,1510571962
1442,14,257,4.0,1510571877


In [18]:
val.shape

(1311, 4)

## Embedding layer

An embedding layer enables us to encode users and items into vectors. Every user and item is going to have a (unique) vector. These vectors are parameters of the model that are going to be learn in the optimization process. Ideally, the embeddings capture properties of the data by placing similar users (items) in close together in the embedding space.

In [19]:
import torch
import torch.nn as nn
import torch.nn.functional as F

In [20]:
# an Embedding module containing 10 users or items embedding size 3
# embedding will be initialized at random
embed = nn.Embedding(10, 3)
embed.weight

Parameter containing:
tensor([[-0.3466, -0.1950, -0.1154],
        [ 0.0362,  0.2757,  1.3792],
        [ 0.3929,  1.0744,  0.5927],
        [ 0.4669, -0.0147,  0.3422],
        [ 0.4851, -0.5330,  0.5651],
        [ 1.3804, -0.1979,  0.7237],
        [-0.6133,  0.5874,  1.4174],
        [ 1.3398,  0.1105, -0.0781],
        [ 0.7891,  0.0903, -1.0264],
        [-0.6265,  0.7423, -1.2897]], requires_grad=True)

In [21]:
# given a list of ids we can "look up" the embedding corresponing to each id
# can you see that some vectors are the same?
a = torch.LongTensor([[1,0,1,4,5,1]])
embed(a)

tensor([[[ 0.0362,  0.2757,  1.3792],
         [-0.3466, -0.1950, -0.1154],
         [ 0.0362,  0.2757,  1.3792],
         [ 0.4851, -0.5330,  0.5651],
         [ 1.3804, -0.1979,  0.7237],
         [ 0.0362,  0.2757,  1.3792]]], grad_fn=<EmbeddingBackward>)

## Matrix factorization model

In [22]:
class MF(nn.Module):
    def __init__(self, num_users, num_items, emb_size=100):
        super(MF, self).__init__()
        self.user_emb = nn.Embedding(num_users, emb_size)
        self.item_emb = nn.Embedding(num_items, emb_size)
        # initlializing weights
        self.user_emb.weight.data.uniform_(0,0.05)
        self.item_emb.weight.data.uniform_(0,0.05)
        
    def forward(self, u, v):
        u = self.user_emb(u)
        v = self.item_emb(v)
        return (u*v).sum(1)   

## Debugging MF model

In [23]:
df = pd.DataFrame({"userId": [0, 0, 1, 1, 3, 4], "movieId": [0, 1, 2, 1, 3, 0], "rating": [4, 5, 3, 1, 3, 4]})
df

Unnamed: 0,userId,movieId,rating
0,0,0,4
1,0,1,5
2,1,2,3
3,1,1,1
4,3,3,3
5,4,0,4


In [24]:
users = torch.LongTensor(df.userId.values)
users

tensor([0, 0, 1, 1, 3, 4])

In [25]:
items = torch.LongTensor(df.movieId.values)
items

tensor([0, 1, 2, 1, 3, 0])

In [26]:
num_users = 5
num_items = 4
emb_size = 3

user_emb = nn.Embedding(num_users, emb_size)
item_emb = nn.Embedding(num_items, emb_size)
users = torch.LongTensor(df.userId.values)
items = torch.LongTensor(df.movieId.values)

In [27]:
U = user_emb(users)
V = item_emb(items)

In [28]:
U

tensor([[ 0.2868,  0.3426, -1.6501],
        [ 0.2868,  0.3426, -1.6501],
        [-0.8100, -0.4015, -0.1291],
        [-0.8100, -0.4015, -0.1291],
        [ 1.6980, -0.6988, -0.0613],
        [-0.7899,  0.9728,  0.3171]], grad_fn=<EmbeddingBackward>)

In [29]:
V

tensor([[ 1.1784, -0.8637,  0.5198],
        [-0.2520,  1.8606,  0.1429],
        [-0.4986, -0.6751,  0.7771],
        [-0.2520,  1.8606,  0.1429],
        [ 1.1954, -1.3335, -1.5388],
        [ 1.1784, -0.8637,  0.5198]], grad_fn=<EmbeddingBackward>)

In [30]:
# element wise multiplication
U*V 

tensor([[ 0.3380, -0.2959, -0.8577],
        [-0.0723,  0.6375, -0.2358],
        [ 0.4038,  0.2711, -0.1003],
        [ 0.2041, -0.7471, -0.0184],
        [ 2.0299,  0.9319,  0.0943],
        [-0.9308, -0.8401,  0.1648]], grad_fn=<MulBackward0>)

In [31]:
# what we want is a dot product per row
(U*V).sum(1) 

tensor([-0.8156,  0.3293,  0.5746, -0.5614,  3.0560, -1.6062],
       grad_fn=<SumBackward1>)

## Training MF model

In [32]:
num_users = len(train.userId.unique())
num_items = len(train.movieId.unique())
print(num_users, num_items) 

522 7867


In [33]:
# here we are not using data loaders because our data fits well in memory
def train_epocs(model, epochs=10, lr=0.01, wd=0.0):
    optimizer = torch.optim.Adam(model.parameters(), lr=lr,
                                 weight_decay=wd)
    for i in range(epochs):
        model.train()
        users = torch.LongTensor(train.userId.values)  #.cuda()
        items = torch.LongTensor(train.movieId.values) #.cuda()
        ratings = torch.FloatTensor(train.rating.values)  #.cuda()
    
        y_hat = model(users, items)
        loss = F.mse_loss(y_hat, ratings)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        testloss = valid_loss(model)
        print("train loss %.3f valid loss %.3f" % (loss.item(), testloss)) 

In [34]:
def valid_loss(model):
    model.eval()
    users = torch.LongTensor(val.userId.values) # .cuda()
    items = torch.LongTensor(val.movieId.values) #.cuda()
    ratings = torch.FloatTensor(val.rating.values) #.cuda()
    y_hat = model(users, items)
    loss = F.mse_loss(y_hat, ratings)
    return loss.item()

In [35]:
num_users = len(train_user_ids)
num_users

522

In [36]:
num_items = len(train_movie_ids)
num_items

7867

In [37]:
model = MF(num_users, num_items, emb_size=100)  # if you have a GPU .cuda()

In [38]:
train_epocs(model, epochs=20, lr=0.1, wd=1e-5)

train loss 12.944 valid loss 5.116
train loss 4.875 valid loss 2.662
train loss 2.524 valid loss 4.554
train loss 2.998 valid loss 1.475
train loss 0.854 valid loss 1.816
train loss 1.873 valid loss 2.656
train loss 2.687 valid loss 2.354
train loss 2.132 valid loss 1.421
train loss 1.071 valid loss 1.146
train loss 0.971 valid loss 1.749
train loss 1.625 valid loss 1.729
train loss 1.311 valid loss 1.159
train loss 0.774 valid loss 1.000
train loss 0.963 valid loss 1.188
train loss 1.343 valid loss 1.256
train loss 1.309 valid loss 1.108
train loss 0.918 valid loss 1.011
train loss 0.674 valid loss 1.154
train loss 0.869 valid loss 1.258
train loss 1.021 valid loss 1.090


In [39]:
train_epocs(model, epochs=15, lr=0.01, wd=1e-5)

train loss 0.800 valid loss 0.921
train loss 0.630 valid loss 0.903
train loss 0.641 valid loss 0.909
train loss 0.663 valid loss 0.911
train loss 0.649 valid loss 0.917
train loss 0.623 valid loss 0.923
train loss 0.606 valid loss 0.922
train loss 0.601 valid loss 0.908
train loss 0.598 valid loss 0.887
train loss 0.591 valid loss 0.866
train loss 0.578 valid loss 0.850
train loss 0.568 valid loss 0.843
train loss 0.562 valid loss 0.841
train loss 0.559 valid loss 0.842
train loss 0.556 valid loss 0.845


In [40]:
train_epocs(model, epochs=15, lr=0.001, wd=1e-5)

train loss 0.549 valid loss 0.844
train loss 0.541 valid loss 0.843
train loss 0.536 valid loss 0.844
train loss 0.531 valid loss 0.844
train loss 0.528 valid loss 0.845
train loss 0.525 valid loss 0.846
train loss 0.523 valid loss 0.847
train loss 0.521 valid loss 0.847
train loss 0.518 valid loss 0.848
train loss 0.516 valid loss 0.848
train loss 0.514 valid loss 0.849
train loss 0.512 valid loss 0.849
train loss 0.510 valid loss 0.849
train loss 0.508 valid loss 0.849
train loss 0.506 valid loss 0.849


## MF with bias

In [41]:
class MF_bias(nn.Module):
    def __init__(self, num_users, num_items, emb_size=100):
        super(MF_bias, self).__init__()
        self.user_emb = nn.Embedding(num_users, emb_size)
        self.user_bias = nn.Embedding(num_users, 1)
        self.item_emb = nn.Embedding(num_items, emb_size)
        self.item_bias = nn.Embedding(num_items, 1)
        # init 
        self.user_emb.weight.data.uniform_(0,0.05)
        self.item_emb.weight.data.uniform_(0,0.05)
        self.user_bias.weight.data.uniform_(-0.01,0.01)
        self.item_bias.weight.data.uniform_(-0.01,0.01)
        
    def forward(self, u, v):
        U = self.user_emb(u)
        V = self.item_emb(v)
        b_u = self.user_bias(u).squeeze()
        b_v = self.item_bias(v).squeeze()
        return (U*V).sum(1) +  b_u  + b_v

In [42]:
model = MF_bias(num_users, num_items, emb_size=100) #.cuda()

In [43]:
train_epocs(model, epochs=15, lr=0.1, wd=1e-5)

train loss 12.942 valid loss 4.346
train loss 4.130 valid loss 3.888
train loss 3.743 valid loss 3.625
train loss 2.268 valid loss 1.229
train loss 0.769 valid loss 1.787
train loss 1.846 valid loss 2.487
train loss 2.502 valid loss 2.274
train loss 2.106 valid loss 1.554
train loss 1.280 valid loss 1.159
train loss 0.949 valid loss 1.427
train loss 1.274 valid loss 1.579
train loss 1.276 valid loss 1.278
train loss 0.897 valid loss 1.041
train loss 0.819 valid loss 1.070
train loss 1.044 valid loss 1.151


In [44]:
train_epocs(model, epochs=10, lr=0.01, wd=1e-5)

train loss 1.190 valid loss 0.980
train loss 0.883 valid loss 0.940
train loss 0.712 valid loss 0.967
train loss 0.663 valid loss 1.001
train loss 0.681 valid loss 1.005
train loss 0.704 valid loss 0.981
train loss 0.703 valid loss 0.944
train loss 0.684 valid loss 0.910
train loss 0.660 valid loss 0.887
train loss 0.644 valid loss 0.877


In [45]:
train_epocs(model, epochs=10, lr=0.001, wd=1e-5)

train loss 0.638 valid loss 0.873
train loss 0.631 valid loss 0.870
train loss 0.624 valid loss 0.867
train loss 0.618 valid loss 0.865
train loss 0.613 valid loss 0.864
train loss 0.609 valid loss 0.863
train loss 0.605 valid loss 0.863
train loss 0.602 valid loss 0.862
train loss 0.599 valid loss 0.862
train loss 0.596 valid loss 0.862


In [46]:
train_epocs(model, epochs=10, lr=0.001, wd=1e-5)

train loss 0.594 valid loss 0.859
train loss 0.591 valid loss 0.857
train loss 0.588 valid loss 0.856
train loss 0.586 valid loss 0.855
train loss 0.584 valid loss 0.855
train loss 0.582 valid loss 0.855
train loss 0.580 valid loss 0.855
train loss 0.578 valid loss 0.855
train loss 0.576 valid loss 0.855
train loss 0.574 valid loss 0.855


In [47]:
train_epocs(model, epochs=10, lr=0.001, wd=1e-5)

train loss 0.571 valid loss 0.856
train loss 0.569 valid loss 0.857
train loss 0.567 valid loss 0.858
train loss 0.565 valid loss 0.858
train loss 0.563 valid loss 0.859
train loss 0.561 valid loss 0.860
train loss 0.558 valid loss 0.860
train loss 0.556 valid loss 0.860
train loss 0.554 valid loss 0.861
train loss 0.551 valid loss 0.861


Note that these models are susceptible to weight initialization, optimization algorithm and regularization.

## Lab
* Can we change the first model to predict numbers in a particular range? Hint: sigmoid would create numbers between 0 and 1. Would this improve the model?
* Would a different Loss function improve results? What about absolute value instead of F.mse_loss?

# References
* This notebook is based on [lesson 5 of Jeremy Howard's Deep Learning Course](https://github.com/fastai/fastai/blob/master/courses/dl1/lesson5-movielens.ipynb)