# Collaborative Filtering with Neural Networks

In this notebook we will write a matrix factorization model in pytorch to solve a recommendation problem. Then we will write a more general neural model for the same problem.

The MovieLens dataset (ml-latest-small) describes 5-star rating and free-text tagging activity from MovieLens, a movie recommendation service. It contains 100004 ratings and 1296 tag applications across 9125 movies. https://grouplens.org/datasets/movielens/. To get the data:

`wget http://files.grouplens.org/datasets/movielens/ml-latest-small.zip`

## MovieLens dataset

In [1]:
from pathlib import Path
import pandas as pd
import numpy as np

In [2]:
PATH = Path("/data2/yinterian/ml-latest-small/")
list(PATH.iterdir())

[PosixPath('/data2/yinterian/ml-latest-small/ratings.csv'),
 PosixPath('/data2/yinterian/ml-latest-small/tags.csv'),
 PosixPath('/data2/yinterian/ml-latest-small/tiny_training2.csv'),
 PosixPath('/data2/yinterian/ml-latest-small/links.csv'),
 PosixPath('/data2/yinterian/ml-latest-small/tiny_val2.csv'),
 PosixPath('/data2/yinterian/ml-latest-small/README.txt'),
 PosixPath('/data2/yinterian/ml-latest-small/movies.csv')]

In [3]:
! head /data2/yinterian/ml-latest-small/ratings.csv

userId,movieId,rating,timestamp
1,31,2.5,1260759144
1,1029,3.0,1260759179
1,1061,3.0,1260759182
1,1129,2.0,1260759185
1,1172,4.0,1260759205
1,1263,2.0,1260759151
1,1287,2.0,1260759187
1,1293,2.0,1260759148
1,1339,3.5,1260759125


In [4]:
data = pd.read_csv(PATH/"ratings.csv")

In [6]:
data.head()

Unnamed: 0,userId,movieId,rating,timestamp
0,1,31,2.5,1260759144
1,1,1029,3.0,1260759179
2,1,1061,3.0,1260759182
3,1,1129,2.0,1260759185
4,1,1172,4.0,1260759205


### Encoding data
This is similar to what you did for your hw1 in ML-2. We enconde the data to have contiguous ids for users and movies. You can think about this as a categorical encoding of our two categorical variables userId and movieId.

In [7]:
# split train and validation before encoding
np.random.seed(3)
msk = np.random.rand(len(data)) < 0.8
train = data[msk].copy()
val = data[~msk].copy()

In [8]:
# here is a handy function modified from fast.ai
def proc_col(col, train_col=None):
    """Encodes a pandas column with continous ids. 
    """
    if train_col is not None:
        uniq = train_col.unique()
    else:
        uniq = col.unique()
    name2idx = {o:i for i,o in enumerate(uniq)}
    return name2idx, np.array([name2idx.get(x, -1) for x in col]), len(uniq)

In [9]:
def encode_data(df, train=None):
    """ Encodes rating data with continous user and movie ids. 
    If train is provided, encodes df with the same encoding as train.
    """
    df = df.copy()
    for col_name in ["userId", "movieId"]:
        train_col = None
        if train is not None:
            train_col = train[col_name]
        _,col,_ = proc_col(df[col_name], train_col)
        df[col_name] = col
        df = df[df[col_name] >= 0]
    return df

In [10]:
# to check my new implementation
df_t = pd.read_csv(PATH/"tiny_training2.csv")
df_v = pd.read_csv(PATH/"tiny_val2.csv")
df_t_e = encode_data(df_t)
df_v_e = encode_data(df_v, df_t)
#df_v_e
#df_t_e

In [11]:
df_train = encode_data(train)
df_val = encode_data(val, train)

## Embedding layer

In [14]:
import torch
import torch.nn as nn
import torch.nn.functional as F

In [15]:
# an Embedding module containing 10 user or item embedding size 3
# embedding will be initialized at random
embed = nn.Embedding(10, 3)

In [16]:
# given a list of ids we can "look up" the embedding corresponing to each id
a = torch.LongTensor([[1,2,0,4,5,1]])
embed(a)

tensor([[[ 0.9053,  1.1727,  0.3294],
         [-1.4227, -2.2962,  0.5080],
         [-0.9733, -0.6481, -0.3051],
         [-0.6444, -0.6796,  0.8563],
         [ 0.8981,  0.0364,  0.2106],
         [ 0.9053,  1.1727,  0.3294]]])

## Matrix factorization model

In [17]:
class MF(nn.Module):
    def __init__(self, num_users, num_items, emb_size=100):
        super(MF, self).__init__()
        self.user_emb = nn.Embedding(num_users, emb_size)
        self.item_emb = nn.Embedding(num_items, emb_size)
        self.user_emb.weight.data.uniform_(0,0.05)
        self.item_emb.weight.data.uniform_(0,0.05)
        
    def forward(self, u, v):
        u = self.user_emb(u)
        v = self.item_emb(v)
        return (u*v).sum(1)   

## Debugging MF model

In [18]:
df_t_e

Unnamed: 0,userId,movieId,rating
0,0,0,4
1,0,1,5
2,1,1,5
3,1,2,3
4,2,0,4
5,2,1,4
6,3,0,5
7,3,3,2
8,4,0,1
9,4,3,4


In [19]:
num_users = 7
num_items = 4
emb_size = 3

user_emb = nn.Embedding(num_users, emb_size)
item_emb = nn.Embedding(num_items, emb_size)
users = torch.LongTensor(df_t_e.userId.values)
items = torch.LongTensor(df_t_e.movieId.values)

In [20]:
U = user_emb(users)
V = item_emb(items)

In [21]:
U

tensor([[ 0.0314,  0.2520, -2.4494],
        [ 0.0314,  0.2520, -2.4494],
        [-0.2523,  0.2228,  1.0559],
        [-0.2523,  0.2228,  1.0559],
        [-0.5906, -0.2694,  0.3983],
        [-0.5906, -0.2694,  0.3983],
        [-1.4998, -0.1576, -1.0433],
        [-1.4998, -0.1576, -1.0433],
        [ 0.0612,  1.6143, -0.8069],
        [ 0.0612,  1.6143, -0.8069],
        [ 0.7100,  0.7157,  0.5734],
        [ 0.2706,  0.4795, -1.1281],
        [ 0.2706,  0.4795, -1.1281]])

In [22]:
# element wise multiplication
U*V 

tensor([[ 0.0568, -0.1576, -0.2046],
        [ 0.0198, -0.1048,  1.5903],
        [-0.1594, -0.0927, -0.6856],
        [-0.2706, -0.0293,  1.5575],
        [-1.0709,  0.1685,  0.0333],
        [-0.3731,  0.1120, -0.2586],
        [-2.7193,  0.0986, -0.0871],
        [-0.7194,  0.1423,  0.5283],
        [ 0.1110, -1.0094, -0.0674],
        [ 0.0294, -1.4579,  0.4086],
        [ 0.3406, -0.6463, -0.2904],
        [ 0.1710, -0.1994,  0.7324],
        [ 0.1298, -0.4330,  0.5713]])

In [23]:
# what we want is a dot product per row
(U*V).sum(1) 

tensor([-0.3053,  1.5053, -0.9376,  1.2575, -0.8691, -0.5197, -2.7079,
        -0.0488, -0.9659, -1.0199, -0.5962,  0.7040,  0.2681])

## Training MF model

In [24]:
num_users = len(df_train.userId.unique())
num_items = len(df_train.movieId.unique())
print(num_users, num_items) 

671 8442


In [71]:
model = MF(num_users, num_items, emb_size=100).cuda()

In [90]:
def train_epocs(model, epochs=10, lr=0.01, wd=0.0, unsqueeze=False):
    parameters = filter(lambda p: p.requires_grad, model.parameters())
    optimizer = torch.optim.Adam(parameters, lr=lr, weight_decay=wd)
    model.train()
    for i in range(epochs):
        users = torch.LongTensor(df_train.userId.values).cuda()
        items = torch.LongTensor(df_train.movieId.values).cuda()
        ratings = torch.FloatTensor(df_train.rating.values).cuda()
        if unsqueeze:
            ratings = ratings.unsqueeze(1)
        y_hat = model(users, items)
        loss = F.mse_loss(y_hat, ratings)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        print(loss.item()) # used to be loss.data[0]
    test_loss(model, unsqueeze)

In [73]:
# Here is what unsqueeze does
ratings = torch.FloatTensor(df_train.rating.values).unsqueeze(1).cuda()
ratings.shape

torch.Size([79799, 1])

In [91]:
def test_loss(model, unsqueeze=False):
    model.eval()
    users = torch.LongTensor(df_val.userId.values).cuda()
    items = torch.LongTensor(df_val.movieId.values).cuda()
    ratings = torch.FloatTensor(df_val.rating.values).cuda()
    if unsqueeze:
        ratings = ratings.unsqueeze(1)
    y_hat = model(users, items)
    loss = F.mse_loss(y_hat, ratings)
    print("test loss %.3f " % loss.item())

In [78]:
train_epocs(model, epochs=10, lr=0.1)

13.2308988571167
5.118462562561035
2.381594181060791
3.4458632469177246
0.9089943170547485
1.8091411590576172
2.748492956161499
2.277597427368164
1.1571255922317505
0.9239360094070435
test loss 1.948 


In [79]:
train_epocs(model, epochs=15, lr=0.01)

1.7040565013885498
1.0518261194229126
0.7497240900993347
0.6944956183433533
0.7591447830200195
0.839592695236206
0.8820865750312805
0.8760321140289307
0.8342975378036499
0.7776389718055725
0.7254546880722046
0.690631628036499
0.6771858334541321
0.6807346343994141
0.6917547583580017
test loss 0.893 


In [80]:
train_epocs(model, epochs=15, lr=0.01)

0.700359582901001
0.6624599695205688
0.6683605313301086
0.6454388499259949
0.6381526589393616
0.6450813412666321
0.6406697630882263
0.6255775094032288
0.6146196722984314
0.6135033369064331
0.614220380783081
0.608429491519928
0.5970779657363892
0.5863724946975708
0.5796968936920166
test loss 0.822 


## MF with bias

In [81]:
class MF_bias(nn.Module):
    def __init__(self, num_users, num_items, emb_size=100):
        super(MF_bias, self).__init__()
        self.user_emb = nn.Embedding(num_users, emb_size)
        self.user_bias = nn.Embedding(num_users, 1)
        self.item_emb = nn.Embedding(num_items, emb_size)
        self.item_bias = nn.Embedding(num_items, 1)
        self.user_emb.weight.data.uniform_(0,0.05)
        self.item_emb.weight.data.uniform_(0,0.05)
        self.user_bias.weight.data.uniform_(-0.01,0.01)
        self.item_bias.weight.data.uniform_(-0.01,0.01)
        
    def forward(self, u, v):
        U = self.user_emb(u)
        V = self.item_emb(v)
        b_u = self.user_bias(u).squeeze()
        b_v = self.item_bias(v).squeeze()
        return (U*V).sum(1) +  b_u  + b_v

In [82]:
model = MF_bias(num_users, num_items, emb_size=100).cuda()

In [83]:
train_epocs(model, epochs=10, lr=0.1, wd=1e-5)

13.23005485534668
4.37229061126709
3.489661455154419
2.469104051589966
0.7883444428443909
1.8147776126861572
2.5224339962005615
2.1426548957824707
1.2771697044372559
0.90544193983078
test loss 1.537 


In [84]:
train_epocs(model, epochs=10, lr=0.01, wd=1e-5)

1.2818944454193115
0.8581875562667847
0.6948232650756836
0.6957782506942749
0.7547910213470459
0.7999427914619446
0.8066312670707703
0.7803653478622437
0.7378935217857361
0.6963903307914734
test loss 0.824 


In [85]:
train_epocs(model, epochs=10, lr=0.001, wd=1e-5)

0.667972981929779
0.6598981022834778
0.6531437039375305
0.6476556658744812
0.6432989835739136
0.6398909091949463
0.6372292041778564
0.6351196765899658
0.6333951354026794
0.6319261193275452
test loss 0.810 


In [86]:
train_epocs(model, epochs=10, lr=0.001, wd=1e-5)

0.6306198239326477
0.6286225318908691
0.6270754933357239
0.6256601810455322
0.624238908290863
0.6227835416793823
0.6213079690933228
0.6198315024375916
0.6183648109436035
0.6169077754020691
test loss 0.811 


Note that these models are susceptible to weight initialization, optimization algorithm and regularization.

## Neural Network Model

In [87]:
# Note here there is no matrix multiplication, we could potentially make the embeddings of different sizes.
# Here we could get better results by keep playing with regularization.
    
class CollabFNet(nn.Module):
    def __init__(self, num_users, num_items, emb_size=100, n_hidden=10):
        super(CollabFNet, self).__init__()
        self.user_emb = nn.Embedding(num_users, emb_size)
        self.item_emb = nn.Embedding(num_items, emb_size)
        self.lin1 = nn.Linear(emb_size*2, n_hidden)
        self.lin2 = nn.Linear(n_hidden, 1)
        self.drop1 = nn.Dropout(0.1)
        self.drop2 = nn.Dropout(0.0)
        
    def forward(self, u, v):
        U = self.user_emb(u)
        V = self.item_emb(v)
        x = F.relu(torch.cat([U, V], dim=1))
        x = self.drop1(x)
        x = F.relu(self.lin1(x))
        x = self.drop2(x)
        x = self.lin2(x)
        return x

In [88]:
model = CollabFNet(num_users, num_items, emb_size=100).cuda()

In [92]:
train_epocs(model, epochs=20, lr=0.01, wd=1e-5, unsqueeze=True) 

12.414087295532227
8.912494659423828
5.733146667480469
3.1898252964019775
1.6557295322418213
1.3019455671310425
1.9354536533355713
2.7928056716918945
3.1729886531829834
2.9997243881225586
2.494485855102539
1.918487787246704
1.4587417840957642
1.2117539644241333
1.1610827445983887
1.2513060569763184
1.4043771028518677
1.5479263067245483
1.6353124380111694
1.6534740924835205
test loss 1.601 


In [93]:
train_epocs(model, epochs=20, lr=0.01, wd=1e-6, unsqueeze=True)

1.5990087985992432
1.0634599924087524
1.3196831941604614
1.271923303604126
1.0663458108901978
0.9884128570556641
1.050430417060852
1.0983532667160034
1.0568546056747437
0.9677290916442871
0.904969334602356
0.9057765603065491
0.940522313117981
0.9427617788314819
0.9031198620796204
0.8513400554656982
0.8315029144287109
0.8408262729644775
0.8549620509147644
0.8462169170379639
test loss 0.858 


In [94]:
train_epocs(model, epochs=10, lr=0.001, wd=1e-6, unsqueeze=True)

0.8195101618766785
0.8015755414962769
0.7948648929595947
0.7984756827354431
0.7993371486663818
0.7992619276046753
0.7974718809127808
0.793518602848053
0.7894296050071716
0.7882153987884521
test loss 0.829 


In [95]:
train_epocs(model, epochs=20, lr=0.001, wd=1e-6, unsqueeze=True)

0.7880702614784241
0.7873403429985046
0.7867722511291504
0.7839091420173645
0.7838999032974243
0.7837383151054382
0.782160222530365
0.7795288562774658
0.7787625193595886
0.7775472402572632
0.7760753631591797
0.7758849859237671
0.7750654816627502
0.7734890580177307
0.7722294330596924
0.7731775045394897
0.7680322527885437
0.767857551574707
0.7685062289237976
0.7683979272842407
test loss 0.817 


## TODO
* use t-sne to visualize embeddings

# Lab
* Work with the largest dataset http://files.grouplens.org/datasets/movielens/ml-latest.zip
* Can you use `tags.csv` and `timestamp` to improve your predictions?
* Play with the hyperparameters
* Look at fastai version of this network and try his transformation https://github.com/fastai/fastai/blob/master/courses/dl1/lesson5-movielens.ipynb

# References
* This notebook is based on [lesson 5 of Jeremy Howard's Deep Learning Course](https://github.com/fastai/fastai/blob/master/courses/dl1/lesson5-movielens.ipynb)