# Matrix factorization with PyTorch

In this notebook I have written a matrix factorization model in pytorch to solve a recommendation problem. 

The MovieLens dataset (ml-latest-small) describes 5-star rating and free-text tagging activity from MovieLens, a movie recommendation service. It contains 100004 ratings and 1296 tag applications across 9125 movies. https://grouplens.org/datasets/movielens/. To get the data:

`wget http://files.grouplens.org/datasets/movielens/ml-latest-small.zip`

## MovieLens dataset

In [1]:
from pathlib import Path
import pandas as pd
import numpy as np

In [2]:
PATH = Path("ml-latest-small")
list(PATH.iterdir())

[PosixPath('ml-latest-small/links.csv'),
 PosixPath('ml-latest-small/tags.csv'),
 PosixPath('ml-latest-small/ratings.csv'),
 PosixPath('ml-latest-small/README.txt'),
 PosixPath('ml-latest-small/movies.csv')]

In [3]:
! head ml-latest-small/ratings.csv

userId,movieId,rating,timestamp

1,1,4.0,964982703

1,3,4.0,964981247

1,6,4.0,964982224

1,47,5.0,964983815

1,50,5.0,964982931

1,70,3.0,964982400

1,101,5.0,964980868

1,110,4.0,964982176

1,151,5.0,964984041



In [4]:
# reading a csv into pandas
data = pd.read_csv(PATH/"ratings.csv")

In [5]:
data.head()

Unnamed: 0,userId,movieId,rating,timestamp
0,1,1,4.0,964982703
1,1,3,4.0,964981247
2,1,6,4.0,964982224
3,1,47,5.0,964983815
4,1,50,5.0,964982931


### Encoding data
> data is encodded to have contiguous ids for users and movies. You can think about this as a categorical encoding of our two categorical variables userId and movieId.

In [6]:
time_80 = np.quantile(data.timestamp.values, 0.8)
time_80

1458635171.0

In [7]:
train = data[data["timestamp"] < time_80].copy()
val = data[data["timestamp"] >= time_80].copy()

In [8]:
val.head()

Unnamed: 0,userId,movieId,rating,timestamp
1434,15,1,2.5,1510577970
1436,15,47,3.5,1510571970
1440,15,260,5.0,1510571946
1441,15,293,3.0,1510571962
1442,15,296,4.0,1510571877


In [9]:
# encoding movies and user ids with continous ids

train_user_ids = np.sort(np.unique(train.userId.values))
train_user_ids[:15]

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15])

In [10]:
# number of unique ids
num_users = len(train_user_ids)
num_users

522

In [11]:
userid2idx = {o:i for i,o in enumerate(train_user_ids)}
#userid2idx

In [12]:
train["userId"] = train["userId"].apply(lambda x: userid2idx[x])
train.head()

Unnamed: 0,userId,movieId,rating,timestamp
0,0,1,4.0,964982703
1,0,3,4.0,964981247
2,0,6,4.0,964982224
3,0,47,5.0,964983815
4,0,50,5.0,964982931


In [13]:
val["userId"] = val["userId"].apply(lambda x: userid2idx.get(x, -1)) # -1 for users not in training
val.head()

Unnamed: 0,userId,movieId,rating,timestamp
1434,14,1,2.5,1510577970
1436,14,47,3.5,1510571970
1440,14,260,5.0,1510571946
1441,14,293,3.0,1510571962
1442,14,296,4.0,1510571877


In [14]:
val = val[val["userId"] >= 0].copy()
val.head()

Unnamed: 0,userId,movieId,rating,timestamp
1434,14,1,2.5,1510577970
1436,14,47,3.5,1510571970
1440,14,260,5.0,1510571946
1441,14,293,3.0,1510571962
1442,14,296,4.0,1510571877


In [15]:
# now encoding movieId
train_movie_ids = np.sort(np.unique(train.movieId.values))
num_items = len(train_movie_ids)
print(num_items)
train_movie_ids[:15]

7867


array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15])

In [16]:
movieid2idx = {o:i for i,o in enumerate(train_movie_ids)}
train["movieId"] = train["movieId"].apply(lambda x: movieid2idx[x])
val["movieId"] = val["movieId"].apply(lambda x: movieid2idx.get(x, -1))

In [17]:
val = val[val["movieId"] >= 0].copy()
val.head()

Unnamed: 0,userId,movieId,rating,timestamp
1434,14,0,2.5,1510577970
1436,14,43,3.5,1510571970
1440,14,224,5.0,1510571946
1441,14,254,3.0,1510571962
1442,14,257,4.0,1510571877


In [18]:
val.shape

(1311, 4)

## Embedding layer

An embedding layer enables us to encode users and items into vectors. Every user and item is going to have a (unique) vector. These vectors are parameters of the model that are going to be learned in the optimization process. Ideally, the embeddings capture properties of the data by placing similar users (items) in close together in the embedding space.

In [19]:
import torch
import torch.nn as nn
import torch.nn.functional as F

In [20]:
# an Embedding module containing 10 users or items embedding size 3
# embedding will be initialized at random
embed = nn.Embedding(10, 3)
embed.weight

Parameter containing:
tensor([[-0.4850,  0.2407, -0.5395],
        [-1.1403,  0.2775, -0.4381],
        [ 0.4341,  0.3196,  1.8334],
        [-0.2300,  0.6329, -0.1284],
        [-0.6864,  1.6903,  1.1327],
        [-1.1610, -0.1672,  0.4097],
        [ 1.0293,  0.2364, -0.5465],
        [-1.0860,  0.6264, -0.4775],
        [-0.4069,  1.8001, -0.4225],
        [ 0.6725, -0.6831, -0.2104]], requires_grad=True)

In [21]:
# given a list of ids we can "look up" the embedding corresponing to each id

a = torch.LongTensor([[1,0,1,4,5,1]])
embed(a)

tensor([[[-1.1403,  0.2775, -0.4381],
         [-0.4850,  0.2407, -0.5395],
         [-1.1403,  0.2775, -0.4381],
         [-0.6864,  1.6903,  1.1327],
         [-1.1610, -0.1672,  0.4097],
         [-1.1403,  0.2775, -0.4381]]], grad_fn=<EmbeddingBackward0>)

## Matrix factorization model

In [34]:
class MF(nn.Module):
    def __init__(self, num_users, num_items, emb_size=100):
        super(MF, self).__init__()
        self.user_emb = nn.Embedding(num_users, emb_size)
        self.item_emb = nn.Embedding(num_items, emb_size)
        # initlializing weights
        self.user_emb.weight.data.uniform_(0,0.05)
        self.item_emb.weight.data.uniform_(0,0.05)
        
    def forward(self, u, v):
        # defining model 
        u = self.user_emb(u)
        v = self.item_emb(v)
        return (u*v).sum(1)   

## Debugging MF model

In [35]:
df = pd.DataFrame({"userId": [0, 0, 1, 1, 3, 4], "movieId": [0, 1, 2, 1, 3, 0], "rating": [4, 5, 3, 1, 3, 4]})
df

Unnamed: 0,userId,movieId,rating
0,0,0,4
1,0,1,5
2,1,2,3
3,1,1,1
4,3,3,3
5,4,0,4


In [36]:
users = torch.LongTensor(df.userId.values)
users

tensor([0, 0, 1, 1, 3, 4])

In [37]:
items = torch.LongTensor(df.movieId.values)
items

tensor([0, 1, 2, 1, 3, 0])

In [38]:
num_users = 5
num_items = 4
emb_size = 3

user_emb = nn.Embedding(num_users, emb_size)
item_emb = nn.Embedding(num_items, emb_size)
users = torch.LongTensor(df.userId.values)
items = torch.LongTensor(df.movieId.values)

In [39]:
U = user_emb(users)
V = item_emb(items)

In [40]:
U

tensor([[-0.4118, -1.2165,  0.4644],
        [-0.4118, -1.2165,  0.4644],
        [-0.5039, -0.4894, -1.9914],
        [-0.5039, -0.4894, -1.9914],
        [-0.6204,  0.8954,  0.2892],
        [ 0.7556, -1.7064, -0.1516]], grad_fn=<EmbeddingBackward0>)

In [41]:
V

tensor([[ 2.1467,  0.7966,  0.2096],
        [-0.0240, -0.5843,  0.5201],
        [-1.1252,  0.3946,  0.3069],
        [-0.0240, -0.5843,  0.5201],
        [-0.3879, -1.4092,  0.2818],
        [ 2.1467,  0.7966,  0.2096]], grad_fn=<EmbeddingBackward0>)

In [42]:
# element wise multiplication
U*V 

tensor([[-0.8840, -0.9690,  0.0973],
        [ 0.0099,  0.7108,  0.2416],
        [ 0.5670, -0.1931, -0.6112],
        [ 0.0121,  0.2860, -1.0357],
        [ 0.2406, -1.2618,  0.0815],
        [ 1.6219, -1.3593, -0.0318]], grad_fn=<MulBackward0>)

In [43]:
# what we want is a dot product per row
(U*V).sum(1) 

tensor([-1.7557,  0.9622, -0.2374, -0.7376, -0.9396,  0.2308],
       grad_fn=<SumBackward1>)

## Training MF model

In [44]:
num_users = len(train.userId.unique())
num_items = len(train.movieId.unique())
print(num_users, num_items) 

522 7867


In [45]:

def train_epochs(model, epochs=10, lr=0.01, wd=0.0):
    optimizer = torch.optim.Adam(model.parameters(), lr=lr,
                                 weight_decay=wd)
    for i in range(epochs):
        model.train()
        users = torch.LongTensor(train.userId.values)  
        items = torch.LongTensor(train.movieId.values) 
        ratings = torch.FloatTensor(train.rating.values)  
    
        y_hat = model(users, items)
        loss = F.mse_loss(y_hat, ratings)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        testloss = valid_loss(model)
        print("train loss %.3f valid loss %.3f" % (loss.item(), testloss)) 

In [46]:
def valid_loss(model):
    model.eval()
    users = torch.LongTensor(val.userId.values) 
    items = torch.LongTensor(val.movieId.values) 
    ratings = torch.FloatTensor(val.rating.values) 
    y_hat = model(users, items)
    loss = F.mse_loss(y_hat, ratings)
    return loss.item()

In [47]:
num_users = len(train_user_ids)
num_users

522

In [48]:
num_items = len(train_movie_ids)
num_items

7867

In [49]:
model = MF(num_users, num_items, emb_size=100)  

In [50]:
train_epochs(model, epochs=20, lr=0.1, wd=1e-5)

train loss 12.944 valid loss 5.108
train loss 4.873 valid loss 2.657
train loss 2.525 valid loss 3.783
train loss 2.999 valid loss 1.789
train loss 0.854 valid loss 2.860
train loss 1.874 valid loss 3.775
train loss 2.688 valid loss 3.344
train loss 2.133 valid loss 2.257
train loss 1.072 valid loss 1.848
train loss 0.972 valid loss 2.211
train loss 1.626 valid loss 1.864
train loss 1.312 valid loss 1.190
train loss 0.774 valid loss 1.126
train loss 0.963 valid loss 1.390
train loss 1.343 valid loss 1.464
train loss 1.310 valid loss 1.336
train loss 0.920 valid loss 1.358
train loss 0.675 valid loss 1.684
train loss 0.869 valid loss 1.865
train loss 1.023 valid loss 1.661


In [51]:
train_epochs(model, epochs=15, lr=0.01, wd=1e-5)

train loss 0.802 valid loss 1.345
train loss 0.631 valid loss 1.226
train loss 0.642 valid loss 1.175
train loss 0.665 valid loss 1.161
train loss 0.651 valid loss 1.185
train loss 0.624 valid loss 1.235
train loss 0.607 valid loss 1.290
train loss 0.602 valid loss 1.327
train loss 0.600 valid loss 1.334
train loss 0.592 valid loss 1.315
train loss 0.580 valid loss 1.286
train loss 0.569 valid loss 1.256
train loss 0.564 valid loss 1.230
train loss 0.561 valid loss 1.211
train loss 0.558 valid loss 1.199


In [52]:
train_epochs(model, epochs=15, lr=0.001, wd=1e-5)

train loss 0.551 valid loss 1.205
train loss 0.544 valid loss 1.213
train loss 0.538 valid loss 1.220
train loss 0.534 valid loss 1.228
train loss 0.530 valid loss 1.234
train loss 0.527 valid loss 1.240
train loss 0.525 valid loss 1.244
train loss 0.523 valid loss 1.247
train loss 0.521 valid loss 1.249
train loss 0.519 valid loss 1.250
train loss 0.517 valid loss 1.250
train loss 0.514 valid loss 1.250
train loss 0.512 valid loss 1.248
train loss 0.510 valid loss 1.247
train loss 0.508 valid loss 1.245


## MF with bias

In [53]:
class MF_bias(nn.Module):
    def __init__(self, num_users, num_items, emb_size=100):
        super(MF_bias, self).__init__()
        self.user_emb = nn.Embedding(num_users, emb_size)
        self.user_bias = nn.Embedding(num_users, 1)
        self.item_emb = nn.Embedding(num_items, emb_size)
        self.item_bias = nn.Embedding(num_items, 1)
        # init 
        self.user_emb.weight.data.uniform_(0,0.05)
        self.item_emb.weight.data.uniform_(0,0.05)
        self.user_bias.weight.data.uniform_(-0.01,0.01)
        self.item_bias.weight.data.uniform_(-0.01,0.01)
        
    def forward(self, u, v):
        U = self.user_emb(u)
        V = self.item_emb(v)
        b_u = self.user_bias(u).squeeze()
        b_v = self.item_bias(v).squeeze()
        return (U*V).sum(1) +  b_u  + b_v

In [54]:
model = MF_bias(num_users, num_items, emb_size=100) #.cuda()

In [55]:
train_epochs(model, epochs=15, lr=0.1, wd=1e-5)

train loss 12.942 valid loss 4.296
train loss 4.124 valid loss 3.932
train loss 3.755 valid loss 2.926
train loss 2.265 valid loss 1.520
train loss 0.770 valid loss 2.723
train loss 1.848 valid loss 3.572
train loss 2.505 valid loss 3.383
train loss 2.110 valid loss 2.682
train loss 1.283 valid loss 2.253
train loss 0.950 valid loss 2.286
train loss 1.273 valid loss 2.023
train loss 1.277 valid loss 1.432
train loss 0.899 valid loss 1.125
train loss 0.820 valid loss 1.163
train loss 1.044 valid loss 1.240


In [56]:
train_epochs(model, epochs=10, lr=0.01, wd=1e-5)

train loss 1.190 valid loss 1.142
train loss 0.884 valid loss 1.156
train loss 0.714 valid loss 1.244
train loss 0.664 valid loss 1.357
train loss 0.681 valid loss 1.451
train loss 0.703 valid loss 1.505
train loss 0.703 valid loss 1.521
train loss 0.683 valid loss 1.509
train loss 0.661 valid loss 1.483
train loss 0.645 valid loss 1.451


In [57]:
train_epochs(model, epochs=10, lr=0.001, wd=1e-5)

train loss 0.639 valid loss 1.432
train loss 0.632 valid loss 1.414
train loss 0.625 valid loss 1.397
train loss 0.619 valid loss 1.380
train loss 0.614 valid loss 1.365
train loss 0.610 valid loss 1.351
train loss 0.606 valid loss 1.338
train loss 0.603 valid loss 1.326
train loss 0.600 valid loss 1.315
train loss 0.597 valid loss 1.305


In [58]:
train_epochs(model, epochs=10, lr=0.001, wd=1e-5)

train loss 0.594 valid loss 1.300
train loss 0.592 valid loss 1.294
train loss 0.589 valid loss 1.289
train loss 0.587 valid loss 1.283
train loss 0.584 valid loss 1.277
train loss 0.582 valid loss 1.272
train loss 0.580 valid loss 1.267
train loss 0.578 valid loss 1.262
train loss 0.576 valid loss 1.258
train loss 0.574 valid loss 1.253


In [59]:
train_epochs(model, epochs=10, lr=0.001, wd=1e-5)

train loss 0.572 valid loss 1.250
train loss 0.570 valid loss 1.247
train loss 0.568 valid loss 1.244
train loss 0.566 valid loss 1.242
train loss 0.563 valid loss 1.240
train loss 0.561 valid loss 1.239
train loss 0.559 valid loss 1.238
train loss 0.557 valid loss 1.238
train loss 0.554 valid loss 1.238
train loss 0.552 valid loss 1.238


Note that these models are sensitive to weight initialization, optimization algorithm and regularization.