# Bibliography
- [Collaborative Filtering for Movie Recommendations](https://keras.io/examples/structured_data/collaborative_filtering_movielens/)
- [Getting Started with a Movie Recommendation System](https://www.kaggle.com/ibtesama/getting-started-with-a-movie-recommendation-system#Content-Based-Filtering)
- [Collaborative Filtering in Pytorch](https://spiyer99.github.io/Recommendation-System-in-Pytorch/)

In [1]:
cd ..

/home/xavier/projects/movielens-recommender


In [2]:
%load_ext autoreload
%autoreload 2

In [3]:
import pandas as pd
from tqdm import tqdm
import torch
from torch.utils.data import DataLoader
import pytorch_lightning as pl

from sklearn.preprocessing import MinMaxScaler
from src.dataset import MovielensDataset
from src.model import CollaborativeFiltering, LightningCollaborativeFiltering

# Data

In [4]:
ratings = pd.read_parquet("./data/processed/ratings.parquet")

In [5]:
ratings.head()

Unnamed: 0,UserID,MovieID,Rating,Timestamp
0,1,1193,5,978300760
1,1,661,3,978302109
2,1,914,3,978301968
3,1,3408,4,978300275
4,1,2355,5,978824291


# Train/Val split
Since we want to predict future recommendations in our recommender, we will split according to date.

In [6]:
q_timestamp = ratings["Timestamp"].quantile(0.8)

train_idx = ratings["Timestamp"] < q_timestamp
val_idx = ratings["Timestamp"] >= q_timestamp

In [7]:
train_ratings = ratings[train_idx].copy()
val_ratings = ratings[val_idx].copy()

In [8]:
train_ratings.shape

(800164, 4)

In [9]:
val_ratings.shape

(200045, 4)

# Scaling

In [10]:
scaler = MinMaxScaler()
scaler.fit(train_ratings[["Rating"]])

MinMaxScaler()

In [11]:
train_ratings["ScaledRating"] = scaler.transform(train_ratings[["Rating"]]).flatten()
val_ratings["ScaledRating"] = scaler.transform(val_ratings[["Rating"]]).flatten()

# Dtypes

In [12]:
# Convert Dtypes to Int32
ratings = ratings.astype("Int32")

# Datasets

In [13]:
BATCH_SIZE = 128
EMBEDDING_DIM = 20

In [14]:
# TODO: Split train/val by date

train_dataset = MovielensDataset(data=train_ratings)
val_dataset = MovielensDataset(data=val_ratings)

train_dataloader = DataLoader(train_dataset, batch_size=BATCH_SIZE, shuffle=True, num_workers=4, pin_memory=True)
val_dataloader = DataLoader(val_dataset, batch_size=BATCH_SIZE, shuffle=False, num_workers=4, pin_memory=True)

In [15]:
model = CollaborativeFiltering(num_users=train_dataset.num_users, num_movies=train_dataset.num_movies, embedding_dim=EMBEDDING_DIM)

In [16]:
trainer = pl.Trainer(max_epochs=10, gpus=1)
pl_model = LightningCollaborativeFiltering(model)
trainer.fit(pl_model, train_dataloaders=train_dataloader, val_dataloaders=val_dataloader)

GPU available: True, used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

  | Name  | Type                   | Params
-------------------------------------------------
0 | model | CollaborativeFiltering | 190 K 
1 | loss  | MSELoss                | 0     
-------------------------------------------------
190 K     Trainable params
0         Non-trainable params
190 K     Total params
0.761     Total estimated model params size (MB)


                                                                                                         

  return F.mse_loss(input, target, reduction=self.reduction)


Epoch 0:  80%|█████████████████████████▌      | 6252/7815 [00:33<00:08, 185.94it/s, loss=0.0935, v_num=0]

  return F.mse_loss(input, target, reduction=self.reduction)



Validating: 0it [00:00, ?it/s][A
Validating:   0%|                                                               | 0/1563 [00:00<?, ?it/s][A
Epoch 0:  80%|█████████████████████████▋      | 6269/7815 [00:33<00:08, 185.49it/s, loss=0.0935, v_num=0][A
Epoch 0:  81%|█████████████████████████▉      | 6320/7815 [00:33<00:08, 186.45it/s, loss=0.0935, v_num=0][A
Epoch 0:  82%|██████████████████████████      | 6371/7815 [00:34<00:07, 187.38it/s, loss=0.0935, v_num=0][A
Epoch 0:  82%|██████████████████████████▎     | 6422/7815 [00:34<00:07, 188.32it/s, loss=0.0935, v_num=0][A
Epoch 0:  83%|██████████████████████████▌     | 6473/7815 [00:34<00:07, 189.16it/s, loss=0.0935, v_num=0][A
Epoch 0:  83%|██████████████████████████▋     | 6524/7815 [00:34<00:06, 189.98it/s, loss=0.0935, v_num=0][A
Epoch 0:  84%|██████████████████████████▉     | 6575/7815 [00:34<00:06, 190.85it/s, loss=0.0935, v_num=0][A
Validating:  21%|██████████▉                                         | 329/1563 [00:00<00:02,

  return F.mse_loss(input, target, reduction=self.reduction)


Epoch 1:  43%|██████████████▏                  | 3351/7815 [00:19<00:25, 175.32it/s, loss=0.074, v_num=0]

  rank_zero_warn("Detected KeyboardInterrupt, attempting graceful shutdown...")
Exception in thread Thread-13:
Traceback (most recent call last):
  File "/home/xavier/miniconda3/envs/py3.9/lib/python3.9/threading.py", line 973, in _bootstrap_inner
    self.run()
  File "/home/xavier/miniconda3/envs/py3.9/lib/python3.9/threading.py", line 910, in run
    self._target(*self._args, **self._kwargs)
  File "/home/xavier/.cache/pypoetry/virtualenvs/imdb-recommender-SIHN6QTU-py3.9/lib/python3.9/site-packages/torch/utils/data/_utils/pin_memory.py", line 28, in _pin_memory_loop
    r = in_queue.get(timeout=MP_STATUS_CHECK_INTERVAL)
  File "/home/xavier/miniconda3/envs/py3.9/lib/python3.9/multiprocessing/queues.py", line 122, in get
    return _ForkingPickler.loads(res)
  File "/home/xavier/.cache/pypoetry/virtualenvs/imdb-recommender-SIHN6QTU-py3.9/lib/python3.9/site-packages/torch/multiprocessing/reductions.py", line 289, in rebuild_storage_fd
    fd = df.detach()
  File "/home/xavier/minicond

Epoch 1:  43%|██████████████▏                  | 3351/7815 [00:32<00:43, 103.01it/s, loss=0.074, v_num=0]