# Matrix factorization with Pytorch

In this notebook we will write a matrix factorization model in pytorch to solve a recommendation problem. 

The MovieLens dataset (ml-latest-small) describes 5-star rating and free-text tagging activity from MovieLens, a movie recommendation service. It contains 100004 ratings and 1296 tag applications across 9125 movies. https://grouplens.org/datasets/movielens/. To get the data:

`wget http://files.grouplens.org/datasets/movielens/ml-latest-small.zip`

## MovieLens dataset

In [1]:
from pathlib import Path
import pandas as pd
import numpy as np

In [3]:
PATH = Path("../data/ml-latest-small/")
list(PATH.iterdir())

[PosixPath('../data/ml-latest-small/links.csv'),
 PosixPath('../data/ml-latest-small/tags.csv'),
 PosixPath('../data/ml-latest-small/ratings.csv'),
 PosixPath('../data/ml-latest-small/README.txt'),
 PosixPath('../data/ml-latest-small/movies.csv')]

In [5]:
! head ../data/ml-latest-small/ratings.csv

userId,movieId,rating,timestamp
1,31,2.5,1260759144
1,1029,3.0,1260759179
1,1061,3.0,1260759182
1,1129,2.0,1260759185
1,1172,4.0,1260759205
1,1263,2.0,1260759151
1,1287,2.0,1260759187
1,1293,2.0,1260759148
1,1339,3.5,1260759125


In [6]:
# reading a csv into pandas
data = pd.read_csv(PATH/"ratings.csv")

In [7]:
data.head()

Unnamed: 0,userId,movieId,rating,timestamp
0,1,31,2.5,1260759144
1,1,1029,3.0,1260759179
2,1,1061,3.0,1260759182
3,1,1129,2.0,1260759185
4,1,1172,4.0,1260759205


### Encoding data
We enconde the data to have contiguous ids for users and movies. You can think about this as a categorical encoding of our two categorical variables userId and movieId.

In [72]:
time_80 = np.quantile(data.timestamp.values, 0.8)
time_80

1339227110.6

In [73]:
train = data[data["timestamp"] < time_80].copy()
val = data[data["timestamp"] >= time_80].copy()

In [74]:
val.head()

Unnamed: 0,userId,movieId,rating,timestamp
790,11,50,5.0,1391658537
791,11,70,1.0,1391656827
792,11,126,4.0,1391657561
793,11,169,3.0,1391657297
794,11,296,5.0,1391658423


In [75]:
# encoding movies and user ids with continous ids

train_user_ids = np.sort(np.unique(train.userId.values))
train_user_ids[:15]

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 12, 13, 14, 15, 16])

In [76]:
# number of unique ids
num_users = len(train_user_ids)
num_users

547

In [77]:
userid2idx = {o:i for i,o in enumerate(train_user_ids)}
#userid2idx

In [78]:
train["userId"] = train["userId"].apply(lambda x: userid2idx[x])
train.head()

Unnamed: 0,userId,movieId,rating,timestamp
0,0,31,2.5,1260759144
1,0,1029,3.0,1260759179
2,0,1061,3.0,1260759182
3,0,1129,2.0,1260759185
4,0,1172,4.0,1260759205


In [79]:
val["userId"] = val["userId"].apply(lambda x: userid2idx.get(x, -1)) # -1 for users not in training
val.head()

Unnamed: 0,userId,movieId,rating,timestamp
790,-1,50,5.0,1391658537
791,-1,70,1.0,1391656827
792,-1,126,4.0,1391657561
793,-1,169,3.0,1391657297
794,-1,296,5.0,1391658423


In [80]:
val = val[val["userId"] >= 0].copy()
val.head()

Unnamed: 0,userId,movieId,rating,timestamp
1019,13,216,1.0,1349622582
1057,13,367,2.0,1374637824
1059,13,371,2.0,1443385370
1060,13,372,3.0,1465793100
1065,13,429,1.0,1465955324


In [91]:
# now encoding movieId
train_movie_ids = np.sort(np.unique(train.movieId.values))
num_items = len(train_movie_ids)
print(num_movies)
train_movie_ids[:15]

7356


array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

In [84]:
movieid2idx = {o:i for i,o in enumerate(train_movie_ids)}
train["movieId"] = train["movieId"].apply(lambda x: movieid2idx[x])
val["movieId"] = val["movieId"].apply(lambda x: movieid2idx.get(x, -1))

In [85]:
val = val[val["movieId"] >= 0].copy()
val.head()

Unnamed: 0,userId,movieId,rating,timestamp
1019,13,189,1.0,1349622582
1057,13,330,2.0,1374637824
1059,13,334,2.0,1443385370
1060,13,335,3.0,1465793100
1065,13,378,1.0,1465955324


In [86]:
val.shape

(1861, 4)

## Embedding layer

In [92]:
import torch
import torch.nn as nn
import torch.nn.functional as F

In [93]:
# an Embedding module containing 10 users or items embedding size 3
# embedding will be initialized at random
embed = nn.Embedding(10, 3)
embed.weight

Parameter containing:
tensor([[ 0.3723,  0.2959, -0.4945],
        [ 0.4870,  0.3067, -1.0038],
        [ 1.0794, -0.8265,  0.9611],
        [-1.0542, -0.2109,  0.9022],
        [ 1.0893,  0.6853,  0.3724],
        [-1.2803, -0.0501,  0.6630],
        [-0.9828, -1.1450,  1.6664],
        [ 0.3043,  0.6056, -0.7636],
        [ 0.6878, -1.1852, -0.7511],
        [ 0.6856, -0.3101,  1.0519]], requires_grad=True)

In [94]:
# given a list of ids we can "look up" the embedding corresponing to each id
# can you see that some vectors are the same?
a = torch.LongTensor([[1,0,1,4,5,1]])
embed(a)

tensor([[[ 0.4870,  0.3067, -1.0038],
         [ 0.3723,  0.2959, -0.4945],
         [ 0.4870,  0.3067, -1.0038],
         [ 1.0893,  0.6853,  0.3724],
         [-1.2803, -0.0501,  0.6630],
         [ 0.4870,  0.3067, -1.0038]]], grad_fn=<EmbeddingBackward>)

## Matrix factorization model

In [95]:
class MF(nn.Module):
    def __init__(self, num_users, num_items, emb_size=100):
        super(MF, self).__init__()
        self.user_emb = nn.Embedding(num_users, emb_size)
        self.item_emb = nn.Embedding(num_items, emb_size)
        # initlializing weights
        self.user_emb.weight.data.uniform_(0,0.05)
        self.item_emb.weight.data.uniform_(0,0.05)
        
    def forward(self, u, v):
        u = self.user_emb(u)
        v = self.item_emb(v)
        return (u*v).sum(1)   

## Debugging MF model

In [97]:
df = pd.DataFrame({"userId": [0, 0, 1, 1, 3, 4], "movieId": [0, 1, 2, 1, 3, 0], "rating": [4, 5, 3, 1, 3, 4]})
df

Unnamed: 0,userId,movieId,rating
0,0,0,4
1,0,1,5
2,1,2,3
3,1,1,1
4,3,3,3
5,4,0,4


In [98]:
num_users = 5
num_items = 4
emb_size = 3

user_emb = nn.Embedding(num_users, emb_size)
item_emb = nn.Embedding(num_items, emb_size)
users = torch.LongTensor(df.userId.values)
items = torch.LongTensor(df.movieId.values)

In [99]:
U = user_emb(users)
V = item_emb(items)

In [103]:
users

tensor([0, 0, 1, 1, 3, 4])

In [100]:
U

tensor([[-0.1588,  0.6016, -0.0554],
        [-0.1588,  0.6016, -0.0554],
        [ 0.0773, -0.5758,  2.1042],
        [ 0.0773, -0.5758,  2.1042],
        [ 1.2791, -0.4015,  0.1770],
        [ 0.5436,  1.0807,  0.8438]], grad_fn=<EmbeddingBackward>)

In [101]:
# element wise multiplication
U*V 

tensor([[-0.0540, -0.6477,  0.0094],
        [-0.3160, -0.7770, -0.0057],
        [ 0.1396,  0.5970, -3.4700],
        [ 0.1539,  0.7438,  0.2179],
        [-0.7230,  0.1517,  0.2027],
        [ 0.1850, -1.1636, -0.1432]], grad_fn=<MulBackward0>)

In [102]:
# what we want is a dot product per row
(U*V).sum(1) 

tensor([-0.6924, -1.0988, -2.7334,  1.1156, -0.3686, -1.1218],
       grad_fn=<SumBackward1>)

## Training MF model

In [104]:
num_users = len(train.userId.unique())
num_items = len(train.movieId.unique())
print(num_users, num_items) 

547 7356


In [113]:
# here we are not using data loaders because our data fits well in memory
def train_epocs(model, epochs=10, lr=0.01, wd=0.0):
    optimizer = torch.optim.Adam(model.parameters(), lr=lr, weight_decay=wd)
    for i in range(epochs):
        model.train()
        users = torch.LongTensor(train.userId.values)  #.cuda()
        items = torch.LongTensor(train.movieId.values) #.cuda()
        ratings = torch.FloatTensor(train.rating.values)  #.cuda()
    
        y_hat = model(users, items)
        loss = F.mse_loss(y_hat, ratings)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        testloss = valid_loss(model)
        print("train loss %.3f valid loss %.3f" % (loss.item(), testloss)) 

In [114]:
def valid_loss(model):
    model.eval()
    users = torch.LongTensor(val.userId.values) # .cuda()
    items = torch.LongTensor(val.movieId.values) #.cuda()
    ratings = torch.FloatTensor(val.rating.values) #.cuda()
    y_hat = model(users, items)
    loss = F.mse_loss(y_hat, ratings)
    return loss.item()

In [115]:
model = MF(num_users, num_items, emb_size=100)  # if you have a GPU .cuda()

In [116]:
train_epocs(model, epochs=20, lr=0.1, wd=1e-5)

train loss 13.252 valid loss 5.265
train loss 5.123 valid loss 2.296
train loss 2.341 valid loss 3.656
train loss 3.302 valid loss 1.400
train loss 0.891 valid loss 2.083
train loss 1.860 valid loss 2.783
train loss 2.763 valid loss 2.293
train loss 2.240 valid loss 1.395
train loss 1.110 valid loss 1.432
train loss 0.928 valid loss 2.101
train loss 1.697 valid loss 1.676
train loss 1.462 valid loss 1.023
train loss 0.817 valid loss 1.201
train loss 0.937 valid loss 1.625
train loss 1.360 valid loss 1.623
train loss 1.389 valid loss 1.233
train loss 1.006 valid loss 0.992
train loss 0.705 valid loss 1.271
train loss 0.867 valid loss 1.556
train loss 1.075 valid loss 1.317


In [117]:
train_epocs(model, epochs=15, lr=0.01, wd=1e-5)

train loss 0.873 valid loss 0.992
train loss 0.661 valid loss 0.921
train loss 0.669 valid loss 0.930
train loss 0.710 valid loss 0.930
train loss 0.704 valid loss 0.918
train loss 0.671 valid loss 0.913
train loss 0.641 valid loss 0.923
train loss 0.631 valid loss 0.939
train loss 0.633 valid loss 0.948
train loss 0.634 valid loss 0.943
train loss 0.624 valid loss 0.928
train loss 0.609 valid loss 0.912
train loss 0.596 valid loss 0.903
train loss 0.590 valid loss 0.901
train loss 0.590 valid loss 0.903


In [118]:
train_epocs(model, epochs=15, lr=0.001, wd=1e-5)

train loss 0.589 valid loss 0.902
train loss 0.581 valid loss 0.902
train loss 0.575 valid loss 0.903
train loss 0.571 valid loss 0.905
train loss 0.567 valid loss 0.907
train loss 0.564 valid loss 0.909
train loss 0.562 valid loss 0.910
train loss 0.559 valid loss 0.912
train loss 0.557 valid loss 0.912
train loss 0.555 valid loss 0.913
train loss 0.554 valid loss 0.913
train loss 0.552 valid loss 0.912
train loss 0.550 valid loss 0.912
train loss 0.548 valid loss 0.911
train loss 0.546 valid loss 0.910


In [119]:
train_epocs(model, epochs=15, lr=0.001, wd=1e-5)

train loss 0.543 valid loss 0.909
train loss 0.541 valid loss 0.908
train loss 0.538 valid loss 0.907
train loss 0.536 valid loss 0.907
train loss 0.533 valid loss 0.908
train loss 0.531 valid loss 0.908
train loss 0.528 valid loss 0.909
train loss 0.526 valid loss 0.910
train loss 0.523 valid loss 0.911
train loss 0.520 valid loss 0.912
train loss 0.518 valid loss 0.913
train loss 0.515 valid loss 0.914
train loss 0.513 valid loss 0.915
train loss 0.510 valid loss 0.916
train loss 0.507 valid loss 0.916


## MF with bias

In [120]:
class MF_bias(nn.Module):
    def __init__(self, num_users, num_items, emb_size=100):
        super(MF_bias, self).__init__()
        self.user_emb = nn.Embedding(num_users, emb_size)
        self.user_bias = nn.Embedding(num_users, 1)
        self.item_emb = nn.Embedding(num_items, emb_size)
        self.item_bias = nn.Embedding(num_items, 1)
        # init 
        self.user_emb.weight.data.uniform_(0,0.05)
        self.item_emb.weight.data.uniform_(0,0.05)
        self.user_bias.weight.data.uniform_(-0.01,0.01)
        self.item_bias.weight.data.uniform_(-0.01,0.01)
        
    def forward(self, u, v):
        U = self.user_emb(u)
        V = self.item_emb(v)
        b_u = self.user_bias(u).squeeze()
        b_v = self.item_bias(v).squeeze()
        return (U*V).sum(1) +  b_u  + b_v

In [121]:
model = MF_bias(num_users, num_items, emb_size=100) #.cuda()

In [122]:
train_epocs(model, epochs=15, lr=0.1, wd=1e-5)

train loss 13.257 valid loss 4.438
train loss 4.361 valid loss 3.416
train loss 3.483 valid loss 2.749
train loss 2.452 valid loss 1.156
train loss 0.789 valid loss 2.013
train loss 1.821 valid loss 2.616
train loss 2.506 valid loss 2.296
train loss 2.102 valid loss 1.641
train loss 1.236 valid loss 1.508
train loss 0.900 valid loss 1.899
train loss 1.310 valid loss 1.741
train loss 1.359 valid loss 1.151
train loss 0.936 valid loss 0.986
train loss 0.828 valid loss 1.220
train loss 1.064 valid loss 1.394


In [123]:
train_epocs(model, epochs=10, lr=0.01, wd=1e-5)

train loss 1.222 valid loss 1.105
train loss 0.908 valid loss 0.937
train loss 0.739 valid loss 0.888
train loss 0.696 valid loss 0.912
train loss 0.720 valid loss 0.954
train loss 0.743 valid loss 0.980
train loss 0.737 valid loss 0.988
train loss 0.712 valid loss 0.987
train loss 0.685 valid loss 0.984
train loss 0.668 valid loss 0.983


In [124]:
train_epocs(model, epochs=10, lr=0.001, wd=1e-5)

train loss 0.663 valid loss 0.972
train loss 0.656 valid loss 0.962
train loss 0.650 valid loss 0.952
train loss 0.645 valid loss 0.943
train loss 0.640 valid loss 0.935
train loss 0.636 valid loss 0.928
train loss 0.632 valid loss 0.921
train loss 0.629 valid loss 0.915
train loss 0.626 valid loss 0.910
train loss 0.623 valid loss 0.905


In [125]:
train_epocs(model, epochs=10, lr=0.001, wd=1e-5)

train loss 0.621 valid loss 0.902
train loss 0.618 valid loss 0.899
train loss 0.615 valid loss 0.896
train loss 0.613 valid loss 0.894
train loss 0.611 valid loss 0.892
train loss 0.609 valid loss 0.890
train loss 0.607 valid loss 0.889
train loss 0.605 valid loss 0.887
train loss 0.602 valid loss 0.886
train loss 0.600 valid loss 0.885


In [126]:
train_epocs(model, epochs=10, lr=0.001, wd=1e-5)

train loss 0.598 valid loss 0.885
train loss 0.596 valid loss 0.885
train loss 0.594 valid loss 0.884
train loss 0.592 valid loss 0.884
train loss 0.589 valid loss 0.884
train loss 0.587 valid loss 0.885
train loss 0.585 valid loss 0.885
train loss 0.583 valid loss 0.885
train loss 0.580 valid loss 0.885
train loss 0.578 valid loss 0.885


Note that these models are susceptible to weight initialization, optimization algorithm and regularization.

# References
* This notebook is based on [lesson 5 of Jeremy Howard's Deep Learning Course](https://github.com/fastai/fastai/blob/master/courses/dl1/lesson5-movielens.ipynb)