# Matrix factorization with PyTorch

In this notebook we will write a matrix factorization model in pytorch to solve a recommendation problem. 

The MovieLens dataset (ml-latest-small) describes 5-star rating and free-text tagging activity from MovieLens, a movie recommendation service. It contains 100004 ratings and 1296 tag applications across 9125 movies. https://grouplens.org/datasets/movielens/. To get the data:

`wget http://files.grouplens.org/datasets/movielens/ml-latest-small.zip`

## MovieLens dataset

In [1]:
from pathlib import Path
import pandas as pd
import numpy as np

In [2]:
PATH = Path("ml-latest-small")
list(PATH.iterdir())

FileNotFoundError: [Errno 2] No such file or directory: 'ml-latest-small'

In [None]:
! head ml-latest-small/ratings.csv

In [3]:
# reading a csv into pandas
data = pd.read_csv(PATH/"ratings.csv")

FileNotFoundError: [Errno 2] No such file or directory: 'ml-latest-small/ratings.csv'

In [52]:
data.head()

Unnamed: 0,userId,movieId,rating,timestamp
0,1,1,4.0,964982703
1,1,3,4.0,964981247
2,1,6,4.0,964982224
3,1,47,5.0,964983815
4,1,50,5.0,964982931


### Encoding data
We enconde the data to have contiguous ids for users and movies. You can think about this as a categorical encoding of our two categorical variables userId and movieId.

In [8]:
time_80 = np.quantile(data.timestamp.values, 0.8)
time_80

1458635171.0

In [53]:
train = data[data["timestamp"] < time_80].copy()
val = data[data["timestamp"] >= time_80].copy()

In [54]:
val.head()

Unnamed: 0,userId,movieId,rating,timestamp
1434,15,1,2.5,1510577970
1436,15,47,3.5,1510571970
1440,15,260,5.0,1510571946
1441,15,293,3.0,1510571962
1442,15,296,4.0,1510571877


In [55]:
# encoding movies and user ids with continous ids

train_user_ids = np.sort(np.unique(train.userId.values))
train_user_ids[:15]

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15])

In [56]:
# number of unique ids
num_users = len(train_user_ids)
num_users

522

In [58]:
userid2idx = {o:i for i,o in enumerate(train_user_ids)}
#userid2idx

In [59]:
train["userId"] = train["userId"].apply(lambda x: userid2idx[x])
train.head()

Unnamed: 0,userId,movieId,rating,timestamp
0,0,1,4.0,964982703
1,0,3,4.0,964981247
2,0,6,4.0,964982224
3,0,47,5.0,964983815
4,0,50,5.0,964982931


In [60]:
val["userId"] = val["userId"].apply(lambda x: userid2idx.get(x, -1)) # -1 for users not in training
val.head()

Unnamed: 0,userId,movieId,rating,timestamp
1434,14,1,2.5,1510577970
1436,14,47,3.5,1510571970
1440,14,260,5.0,1510571946
1441,14,293,3.0,1510571962
1442,14,296,4.0,1510571877


In [61]:
val = val[val["userId"] >= 0].copy()
val.head()

Unnamed: 0,userId,movieId,rating,timestamp
1434,14,1,2.5,1510577970
1436,14,47,3.5,1510571970
1440,14,260,5.0,1510571946
1441,14,293,3.0,1510571962
1442,14,296,4.0,1510571877


In [62]:
# now encoding movieId
train_movie_ids = np.sort(np.unique(train.movieId.values))
num_items = len(train_movie_ids)
print(num_items)
train_movie_ids[:15]

7867


array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15])

In [63]:
movieid2idx = {o:i for i,o in enumerate(train_movie_ids)}
train["movieId"] = train["movieId"].apply(lambda x: movieid2idx[x])
val["movieId"] = val["movieId"].apply(lambda x: movieid2idx.get(x, -1))

In [64]:
val = val[val["movieId"] >= 0].copy()
val.head()

Unnamed: 0,userId,movieId,rating,timestamp
1434,14,0,2.5,1510577970
1436,14,43,3.5,1510571970
1440,14,224,5.0,1510571946
1441,14,254,3.0,1510571962
1442,14,257,4.0,1510571877


In [65]:
val.shape

(1311, 4)

## Embedding layer

An embedding layer enables us to encode users and items into vectors. Every user and item is going to have a (unique) vector. These vectors are parameters of the model that are going to be learn in the optimization process. Ideally, the embeddings capture properties of the data by placing similar users (items) in close together in the embedding space.

In [66]:
import torch
import torch.nn as nn
import torch.nn.functional as F

In [68]:
# an Embedding module containing 10 users or items embedding size 3
# embedding will be initialized at random
embed = nn.Embedding(10, 3)
embed.weight

Parameter containing:
tensor([[ 0.1872,  0.6973,  0.4979],
        [ 0.1799, -0.6118, -0.0166],
        [ 0.3276, -1.0983, -0.6598],
        [ 0.1900,  0.7435,  0.7514],
        [-0.2626,  1.5552, -0.0989],
        [ 0.4269,  0.5635, -0.3911],
        [ 0.9961, -0.0173, -0.6139],
        [ 1.7920, -0.4305, -0.2768],
        [ 0.1814, -0.2246, -1.0103],
        [-1.1223, -1.9114, -0.6136]], requires_grad=True)

In [69]:
# given a list of ids we can "look up" the embedding corresponing to each id
# can you see that some vectors are the same?
a = torch.LongTensor([[1,0,1,4,5,1]])
embed(a)

tensor([[[ 0.1799, -0.6118, -0.0166],
         [ 0.1872,  0.6973,  0.4979],
         [ 0.1799, -0.6118, -0.0166],
         [-0.2626,  1.5552, -0.0989],
         [ 0.4269,  0.5635, -0.3911],
         [ 0.1799, -0.6118, -0.0166]]], grad_fn=<EmbeddingBackward>)

## Matrix factorization model

In [26]:
class MF(nn.Module):
    def __init__(self, num_users, num_items, emb_size=100):
        super(MF, self).__init__()
        self.user_emb = nn.Embedding(num_users, emb_size)
        self.item_emb = nn.Embedding(num_items, emb_size)
        # initlializing weights
        self.user_emb.weight.data.uniform_(0,0.05)
        self.item_emb.weight.data.uniform_(0,0.05)
        
    def forward(self, u, v):
        u = self.user_emb(u)
        v = self.item_emb(v)
        return (u*v).sum(1)   

## Debugging MF model

In [70]:
df = pd.DataFrame({"userId": [0, 0, 1, 1, 3, 4], "movieId": [0, 1, 2, 1, 3, 0], "rating": [4, 5, 3, 1, 3, 4]})
df

Unnamed: 0,userId,movieId,rating
0,0,0,4
1,0,1,5
2,1,2,3
3,1,1,1
4,3,3,3
5,4,0,4


In [71]:
users = torch.LongTensor(df.userId.values)
users

tensor([0, 0, 1, 1, 3, 4])

In [72]:
items = torch.LongTensor(df.movieId.values)
items

tensor([0, 1, 2, 1, 3, 0])

In [73]:
num_users = 5
num_items = 4
emb_size = 3

user_emb = nn.Embedding(num_users, emb_size)
item_emb = nn.Embedding(num_items, emb_size)
users = torch.LongTensor(df.userId.values)
items = torch.LongTensor(df.movieId.values)

In [74]:
U = user_emb(users)
V = item_emb(items)

In [75]:
U

tensor([[ 1.1280, -2.1681, -0.7832],
        [ 1.1280, -2.1681, -0.7832],
        [-0.0556, -1.5602, -0.4952],
        [-0.0556, -1.5602, -0.4952],
        [ 0.3165,  0.0443, -1.9584],
        [ 2.2026, -0.4359, -0.3264]], grad_fn=<EmbeddingBackward>)

In [77]:
V

tensor([[-0.7875, -1.5126, -0.4474],
        [-1.6173,  0.9080,  1.6341],
        [-0.1885,  0.7055, -0.6119],
        [-1.6173,  0.9080,  1.6341],
        [-0.6647,  0.5264, -0.3495],
        [-0.7875, -1.5126, -0.4474]], grad_fn=<EmbeddingBackward>)

In [76]:
# element wise multiplication
U*V 

tensor([[-0.8883,  3.2795,  0.3504],
        [-1.8244, -1.9688, -1.2798],
        [ 0.0105, -1.1007,  0.3030],
        [ 0.0899, -1.4168, -0.8092],
        [-0.2104,  0.0233,  0.6844],
        [-1.7344,  0.6594,  0.1460]], grad_fn=<MulBackward0>)

In [78]:
# what we want is a dot product per row
(U*V).sum(1) 

tensor([ 2.7417, -5.0729, -0.7872, -2.1360,  0.4973, -0.9290],
       grad_fn=<SumBackward1>)

## Training MF model

In [34]:
num_users = len(train.userId.unique())
num_items = len(train.movieId.unique())
print(num_users, num_items) 

522 7867


In [79]:
# here we are not using data loaders because our data fits well in memory
def train_epocs(model, epochs=10, lr=0.01, wd=0.0):
    optimizer = torch.optim.Adam(model.parameters(), lr=lr, weight_decay=wd)
    for i in range(epochs):
        model.train()
        users = torch.LongTensor(train.userId.values)  #.cuda()
        items = torch.LongTensor(train.movieId.values) #.cuda()
        ratings = torch.FloatTensor(train.rating.values)  #.cuda()
    
        y_hat = model(users, items)
        loss = F.mse_loss(y_hat, ratings)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        testloss = valid_loss(model)
        print("train loss %.3f valid loss %.3f" % (loss.item(), testloss)) 

In [80]:
def valid_loss(model):
    model.eval()
    users = torch.LongTensor(val.userId.values) # .cuda()
    items = torch.LongTensor(val.movieId.values) #.cuda()
    ratings = torch.FloatTensor(val.rating.values) #.cuda()
    y_hat = model(users, items)
    loss = F.mse_loss(y_hat, ratings)
    return loss.item()

In [84]:
num_users = len(train_user_ids)
num_users

522

In [85]:
num_items = len(train_movie_ids)
num_items

7867

In [97]:
model = MF(num_users, num_items, emb_size=100)  # if you have a GPU .cuda()

In [98]:
train_epocs(model, epochs=20, lr=0.1, wd=1e-5)

train loss 12.945 valid loss 5.075
train loss 4.878 valid loss 2.663
train loss 2.521 valid loss 4.552
train loss 3.000 valid loss 1.462
train loss 0.852 valid loss 1.817
train loss 1.872 valid loss 2.664
train loss 2.687 valid loss 2.359
train loss 2.133 valid loss 1.419
train loss 1.073 valid loss 1.139
train loss 0.972 valid loss 1.744
train loss 1.623 valid loss 1.731
train loss 1.310 valid loss 1.163
train loss 0.776 valid loss 1.003
train loss 0.965 valid loss 1.190
train loss 1.343 valid loss 1.255
train loss 1.308 valid loss 1.104
train loss 0.917 valid loss 1.010
train loss 0.675 valid loss 1.161
train loss 0.870 valid loss 1.267
train loss 1.021 valid loss 1.094


In [99]:
train_epocs(model, epochs=15, lr=0.01, wd=1e-5)

train loss 0.799 valid loss 0.924
train loss 0.630 valid loss 0.905
train loss 0.642 valid loss 0.909
train loss 0.663 valid loss 0.910
train loss 0.649 valid loss 0.914
train loss 0.623 valid loss 0.919
train loss 0.605 valid loss 0.917
train loss 0.601 valid loss 0.904
train loss 0.598 valid loss 0.885
train loss 0.591 valid loss 0.865
train loss 0.579 valid loss 0.851
train loss 0.568 valid loss 0.845
train loss 0.562 valid loss 0.845
train loss 0.560 valid loss 0.847
train loss 0.556 valid loss 0.852


In [100]:
train_epocs(model, epochs=15, lr=0.001, wd=1e-5)

train loss 0.549 valid loss 0.850
train loss 0.541 valid loss 0.849
train loss 0.536 valid loss 0.849
train loss 0.531 valid loss 0.849
train loss 0.528 valid loss 0.850
train loss 0.525 valid loss 0.851
train loss 0.523 valid loss 0.851
train loss 0.521 valid loss 0.852
train loss 0.518 valid loss 0.852
train loss 0.516 valid loss 0.852
train loss 0.514 valid loss 0.852
train loss 0.512 valid loss 0.852
train loss 0.510 valid loss 0.851
train loss 0.508 valid loss 0.851
train loss 0.506 valid loss 0.851


## MF with bias

In [101]:
class MF_bias(nn.Module):
    def __init__(self, num_users, num_items, emb_size=100):
        super(MF_bias, self).__init__()
        self.user_emb = nn.Embedding(num_users, emb_size)
        self.user_bias = nn.Embedding(num_users, 1)
        self.item_emb = nn.Embedding(num_items, emb_size)
        self.item_bias = nn.Embedding(num_items, 1)
        # init 
        self.user_emb.weight.data.uniform_(0,0.05)
        self.item_emb.weight.data.uniform_(0,0.05)
        self.user_bias.weight.data.uniform_(-0.01,0.01)
        self.item_bias.weight.data.uniform_(-0.01,0.01)
        
    def forward(self, u, v):
        U = self.user_emb(u)
        V = self.item_emb(v)
        b_u = self.user_bias(u).squeeze()
        b_v = self.item_bias(v).squeeze()
        return (U*V).sum(1) +  b_u  + b_v

In [102]:
model = MF_bias(num_users, num_items, emb_size=100) #.cuda()

In [103]:
train_epocs(model, epochs=15, lr=0.1, wd=1e-5)

train loss 12.947 valid loss 4.301
train loss 4.133 valid loss 3.932
train loss 3.735 valid loss 3.594
train loss 2.270 valid loss 1.215
train loss 0.769 valid loss 1.794
train loss 1.845 valid loss 2.494
train loss 2.501 valid loss 2.274
train loss 2.105 valid loss 1.550
train loss 1.278 valid loss 1.161
train loss 0.947 valid loss 1.439
train loss 1.274 valid loss 1.591
train loss 1.277 valid loss 1.280
train loss 0.897 valid loss 1.039
train loss 0.818 valid loss 1.071
train loss 1.043 valid loss 1.155


In [104]:
train_epocs(model, epochs=10, lr=0.01, wd=1e-5)

train loss 1.190 valid loss 0.982
train loss 0.883 valid loss 0.941
train loss 0.713 valid loss 0.971
train loss 0.664 valid loss 1.005
train loss 0.682 valid loss 1.007
train loss 0.703 valid loss 0.981
train loss 0.702 valid loss 0.944
train loss 0.682 valid loss 0.912
train loss 0.659 valid loss 0.891
train loss 0.644 valid loss 0.881


In [105]:
train_epocs(model, epochs=10, lr=0.001, wd=1e-5)

train loss 0.639 valid loss 0.877
train loss 0.631 valid loss 0.873
train loss 0.625 valid loss 0.870
train loss 0.619 valid loss 0.868
train loss 0.614 valid loss 0.867
train loss 0.609 valid loss 0.865
train loss 0.606 valid loss 0.864
train loss 0.602 valid loss 0.864
train loss 0.599 valid loss 0.863
train loss 0.596 valid loss 0.863


In [106]:
train_epocs(model, epochs=10, lr=0.001, wd=1e-5)

train loss 0.594 valid loss 0.859
train loss 0.591 valid loss 0.857
train loss 0.588 valid loss 0.855
train loss 0.586 valid loss 0.855
train loss 0.584 valid loss 0.854
train loss 0.582 valid loss 0.854
train loss 0.580 valid loss 0.855
train loss 0.577 valid loss 0.855
train loss 0.575 valid loss 0.856
train loss 0.573 valid loss 0.856


In [107]:
train_epocs(model, epochs=10, lr=0.001, wd=1e-5)

train loss 0.571 valid loss 0.857
train loss 0.569 valid loss 0.858
train loss 0.567 valid loss 0.858
train loss 0.565 valid loss 0.859
train loss 0.563 valid loss 0.860
train loss 0.560 valid loss 0.860
train loss 0.558 valid loss 0.861
train loss 0.556 valid loss 0.861
train loss 0.553 valid loss 0.861
train loss 0.551 valid loss 0.861


Note that these models are susceptible to weight initialization, optimization algorithm and regularization.

In [6]:
def scale(maxAllowed,minAllowed,unscaledNum,maxActual,minActual):
    return (maxAllowed - minAllowed) * (unscaledNum - minActual) / (maxActual - minActual) + minAllowed

In [None]:
scale(5,1,.2,1,0)

In [13]:
0.2*4+1

1.8

# References
* This notebook is based on [lesson 5 of Jeremy Howard's Deep Learning Course](https://github.com/fastai/fastai/blob/master/courses/dl1/lesson5-movielens.ipynb)