# Collaborative recommender system

In this notebook we build a purely collaboritive system for Fantlab using `LightFM` library.

In [None]:
! git clone git@github.com:yupopov/fantlab-recommender-system.git
import os
os.chdir('fantlab-recommender-system')

Cloning into 'fantlab-recommender-system'...
remote: Enumerating objects: 501, done.[K
remote: Counting objects: 100% (98/98), done.[K
remote: Compressing objects: 100% (68/68), done.[K
remote: Total 501 (delta 55), reused 69 (delta 30), pack-reused 403[K
Receiving objects: 100% (501/501), 545.33 MiB | 25.36 MiB/s, done.
Resolving deltas: 100% (250/250), done.


In [None]:
%%capture
%load_ext autoreload
%autoreload 2

import gzip
import json
from tqdm.auto import tqdm

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy.sparse import coo_matrix, csr_matrix, csc_matrix, load_npz

! pip install lightfm
from lightfm import LightFM
from lightfm.evaluation import precision_at_k, auc_score
from lightfm.data import Dataset

from src.preprocessing.datasets import FMDatasetMaker
from src.models.lightfm import fit_lightfm
from src.models.get_top_k_predictions_with_label import get_top_k_predictions_with_labels
from src.models.LinearRecommender import LinearRecommender

### Dataset

The dataset for a collaborative model is a user-item preference matrix $R$, with rows being users $u \in U$, columns being items $i \in I$, and $r_{u,i} = \rho(u,i)$ is a measure of whether a user $u$ liked an item $i$. A collaborative model tries to approximate $R$ by a product of two matrices $P_U\, (n_{users} \times n_{features})$ and $Q_I \,(n_{features} \times {n_{items}})$, where $n_{features}$ is a relatively small number.

There are essentially two ways to quantify $\rho(u, i)$:
1. Via *explicit user feedback*, that is, through the rating which user left on an item, like/dislike buttons and so on;
2. Via *implicit feedback*, that is, adding an item to a watchlist, measuring the time a user spent consuming an item (on content providers), and so on.

It is clear that in both of these situations $R$ is a rather sparse matrix.


The file to build the dataset is located at `data/raw/marks.json.gz`. It contains all marks (from the date of site creation until 2022-05-08) for works with at least 50 marks. 

### Training
These two ways correspond to two different approaches to training the collaborative model:
- Given a matrix that contains explicit feedback, one tries to minimize the $L_2$ distance between $R$ and $P_U \times Q_I,$ particularly, the metric to optimize is simply RMSE:

$\frac{1}{T}\sum_{(u, i) \in T} (r_{ui} - \langle p_u, q_i \rangle)^2 + \lambda(||p_u||^2 + ||q_i||^2) \to \min,$

where $T$ is the set of nonzero indices of $R$. Undertaking this approach, we fundamentally try to predict the ratings of unknown elements for each user. One can however reason that this approach does not preserve relative ratings for each user (it is easy to come up with an example of a solution that minimizes RMSE, but ruins relative ranks completely). **We think that it is important for a model to preserve this relationship, so we do not follow this path during this project.**

- When dealing with implicit feedback, the elements of $R$ are usually binary, indicating whether a user $u$ interacted with an element $i$ (particularly, elements such that user $u$ interacted with are called *positive* for $u$, and all others *negative*). At first glance, we ignore completely the interaction degrees (that is, there is no obvious measure to tell the model how much a user liked an item). However, one can justify this approach by noting that the data in interaction matrices is likely not missing at random, that is, absence of interaction generally means implicit dislike. From this perspective, it is meaningful to consider the task of approximating $R$ as a ranking problem, aiming to rank all positive elements higher than negative ones (user-by-user).


A suitable loss function for this type of task is called WARP loss. Given a user $u$ and a positive item $i$, it repeatedly samples negative items until it finds a negative item $i'$ such that $\langle p_u, q_i \rangle \le \langle p_u, q_{i'} \rangle + 1$ (basically we are trying to maximize the margin between positive and negative elements). Having found such triplet $(u, i, i'),$ we perform a gradient update to minimize 
$\log\Big\lfloor\frac{n_{items} - 1}{n}\Big\rfloor(\langle p_u, q_{i'} \rangle + 1 - \langle p_u, q_i \rangle),$

where $n$ is the number of times we had to sample to a rank-violating example $i'$. Particularly, if we found a negative example rapidly, then we assume that the model is not yet well-calibrated, and perform a large gradient update, and perform a small gradient update otherwise. You can find some information on WARP loss in [this](https://building-babylon.net/2016/03/18/warp-loss-for-implicit-feedback-recommendation/) post, or in the project desription's references.

During this project, we follow the implicit feedback and WARP loss approach, because it gives [better](https://github.com/maciejkula/explicit-vs-implicit) results for $p@K$, the metric of our choice (see below).


We used `LightFM` library to build sparse matrices dataset and train the models. However, `LightFM` only allows for binary interaction matrices, and in our case we basically have explicit feedback that we want to pass off as implicit. A simple solution would be to binarize the ratings, but doing so, we would deprive our model of valuable information (it is clear that ratings 1 and 10 do not mean the same thing). Luckily, `LightFM` allows for a workaround to incorporate the rating data in the model: interaction weights.





In [None]:
# LightFM experiment function
def run_experiment(dataset_config: dict, model_config: dict):
    dataset_maker = FMDatasetMaker(**dataset_config['FMDatasetMaker_params'])
    fm_dataset = dataset_maker.make_train_test_data(**dataset_config['make_train_test_data_params'])
    model = LightFM(**model_config['LightFM_params'])
    model_ = fit_lightfm(model, fm_dataset, use_item_features=False, fit_params=model_config['fit_params'], 
                                precision_params=model_config['preсision_params'])

    return model_

In [None]:
dataset_config = {'FMDatasetMaker_params': {'n_last_years': 10},
                  'make_train_test_data_params': {
                                      'time_q': 0.8,
                                      'min_marks_user_train': 5,
                                      'min_marks_work': 40,
                                      'marks_transform': 'sigmoid',
                                      'time_weights_params': {
                                            'min_max_normalize': 'user',
                                            'eps_time': 0.2,
                                            'power': 1}}}
                                      
model_config = {'LightFM_params': {'learning_rate': 0.05,
                                    'loss': 'warp',
                                    'item_alpha': 1e-5, 
                                    'user_alpha': 1e-5, 
                                    'no_components': 30,
                                    'random_state': 17},
                'fit_params': {'epochs': 5,
                               'num_threads': 1,                               
                               'verbose': True},
                'preсision_params': {
                               'batch_size': 50,
                               'k': 20}}

In [None]:
fitted_model = run_experiment(dataset_config, model_config)

Loading marks...
Loading item features...
Done.
Stats before filtering by date:
Marks: 7368410	Unique titles: 23867	Unique users: 61739

Deleting marks dated before 2012...
Stats after filtering by date:
Marks: 5056872	Unique titles: 23867	Unique users: 47578

Splitting the marks by 2019-08-04 00:00:00...
Train set stats:
Marks: 4045529	Unique titles: 23425	Unique users: 40450

Test set stats:
Marks: 1011343	Unique titles: 23851	Unique users: 18839

Dropping users with less than 5 marks in the train set...
Stats after filtering:
Marks: 4027038	Unique titles: 23425	Unique users: 30407

Dropping users from the test set with no marks in the train set...
Stats after filtering:
Marks: 630448	Unique titles: 23758	Unique users: 11004

Dropping works with less than 40 marks in the test set...
Stats after filtering:
Marks: 375136	Unique titles: 4249	Unique users: 10589

Computing train mark weights...
Computing date weights...
Constructing train dataset...
Adding item features...
Constructing t

Epoch: 100%|██████████| 5/5 [00:50<00:00, 10.01s/it]


Computing top k recommendations for each user...
Train set:


  0%|          | 0/609 [00:00<?, ?it/s]

Test set:


  0%|          | 0/212 [00:00<?, ?it/s]

Train precision: 0.3386
Test precision: 0.0695
