# Assignment 2: Movie Recommender System
## Notebook 2.0: Advanced technique: GLocal-K
> Practical Machine Learning & Deep Learning course, Fall 2023
#### Author of the notebook: Vladislav Urhzumov, BS21-AI-01

---




Current notebook approves the hypothesis of better prediction capabilities of model described in the [paper](https://arxiv.org/pdf/2108.12184.pdf), named GLocal-K: **G**lobal**Local** **K**ernel-based matrix completion framework.

Optimized code for this solution can be found in the [github repository of ETH Zürich Computational Intelligence Lab](https://github.innominds.com/gsaltintas/CIL-CollaborativeFiltering) made by Team Meowtrix Purrdiction. :)


Please refer to that source for more accurate documentation.


The code provided by ETH Zürich Computational Intelligence Lab team is torch library-adapted version of official (yet obsolete due to absence of TensorFlow v1.x transfer to TensorFlow v2.x, which is mandatory due to fact that v1.x has lost the support by TF developers) [GLocal-K implementation using tensorflow](https://github.com/usydnlp/Glocal_K/blob/main/GLocal_K.ipynb)

Thus, code blocks utilized in this exploratory notebook are official authors' implementation of the proposed model, adapted to torch library with additional efficiency improvements, conformed with my own ideas and dataset preprocessing.

### Libraries import

In [None]:
! pip install optuna pytorch_lightning

Collecting optuna
  Downloading optuna-3.4.0-py3-none-any.whl (409 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m409.6/409.6 kB[0m [31m4.3 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting pytorch_lightning
  Downloading pytorch_lightning-2.1.2-py3-none-any.whl (776 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m776.9/776.9 kB[0m [31m8.0 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting alembic>=1.5.0 (from optuna)
  Downloading alembic-1.13.0-py3-none-any.whl (230 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m230.6/230.6 kB[0m [31m8.6 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting colorlog (from optuna)
  Downloading colorlog-6.8.0-py3-none-any.whl (11 kB)
Collecting torchmetrics>=0.7.0 (from pytorch_lightning)
  Downloading torchmetrics-1.2.1-py3-none-any.whl (806 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m806.1/806.1 kB[0m [31m11.8 MB/s[0m eta [36m0:00:00[0m
Collecting lightning-utilities>

In [None]:
import numpy as np
import pandas as pd
import torch
import torch.nn.functional as F
from typing import Optional
import optuna
import pytorch_lightning as pl
import time
from pathlib import Path
from shutil import copy
from pytorch_lightning.loggers import TensorBoardLogger
from sklearn.metrics import mean_absolute_error, mean_squared_error
from math import sqrt

### Constant setting

In [None]:
names = ['user_id', 'movie_id', 'rating', 'timestamp']
user_col = names[0]
movie_col = names[1]

### Data preprocessing (to get torch.DataLoader)

In [None]:
def extract_users_items_predictions(data_pd):
    print(data_pd)
    users, movies = [], []
    for i in range(data_pd.shape[0]):
        users.append(data_pd.iloc[i][user_col])
        movies.append(data_pd.iloc[i][movie_col])
    predictions = data_pd.rating.values
    return users, movies, predictions


def load_data_cil(path="/content/", file="u.data", frac=0.1):
    data_pd = pd.read_csv(path + file, sep='\t', names=names)
    users, movies, predictions = extract_users_items_predictions(data_pd)
    data = pd.DataFrame.from_dict(
        {user_col: users, movie_col: movies, "rating": predictions}
    )

    indices_u, indices_m = np.unique(data[user_col]), np.unique(data[movie_col])
    n_u = indices_u.size
    n_m = indices_m.size
    n_r = data.shape[0]
    indices_u = list(indices_u)
    indices_m = list(indices_m)

    udict = {}
    for i, u in enumerate(np.unique(data[user_col]).tolist()):
        udict[u] = i
    mdict = {}
    for i, m in enumerate(np.unique(data[movie_col]).tolist()):
        mdict[m] = i

    idx = np.arange(n_r)
    np.random.shuffle(idx)

    train_r = np.zeros((n_m, n_u), dtype="float32")
    test_r = np.zeros((n_m, n_u), dtype="float32")

    for i in range(n_r):
        u_id = data.loc[idx[i]][user_col]
        m_id = data.loc[idx[i]][movie_col]
        r = data.loc[idx[i]]["rating"]

        if i < int(frac * n_r):
            test_r[indices_m.index(m_id), indices_u.index(u_id)] = r
        else:
            train_r[indices_m.index(m_id), indices_u.index(u_id)] = r

    train_m = np.greater(train_r, 1e-12).astype("float32")
    test_m = np.greater(test_r, 1e-12).astype("float32")

    print("data matrix loaded")
    print("num of users: {}".format(n_u))
    print("num of movies: {}".format(n_m))
    print("num of training ratings: {}".format(n_r - int(frac * n_r)))
    print("num of test ratings: {}".format(int(frac * n_r)))

    return n_m, n_u, train_r, train_m, test_r, test_m


class CILDataset(torch.utils.data.Dataset):
    def __init__(self, data_path, file):
        self.data = load_data_cil(data_path, file)

    def __len__(self):
        return 1

    def __getitem__(self, _):
        return self.data


class CILDataLoader(torch.utils.data.DataLoader):
    def __init__(self, file="u.data", data_path="/content/", num_workers=8):
        super().__init__(
            CILDataset(data_path, file), batch_size=None, num_workers=num_workers
        )

### Kernel layers
These classes and function are used as building blocks for more complex structures found in the next code blocks. Basically, they are unit torch layers or sets of layers aka Kernels for GLocalK models construction

In [None]:
class LocalKernelLayer(torch.nn.Module):
    def __init__(self, n_in, n_hid, n_dim, activation, lambda_s, lambda_2):
        super(LocalKernelLayer, self).__init__()
        self.activation = activation

        self.W = torch.nn.parameter.Parameter(
            torch.rand(size=(n_in, n_hid)) * 2 * np.sqrt(6 / (n_in + n_hid))
            - np.sqrt(6 / (n_in + n_hid)),
            requires_grad=True,
        )
        self.u = torch.nn.parameter.Parameter(
            torch.normal(0, 1e-3, size=(n_in, 1, n_dim)), requires_grad=True
        )
        self.v = torch.nn.parameter.Parameter(
            torch.normal(0, 1e-3, size=(1, n_hid, n_dim)), requires_grad=True
        )
        self.b = torch.nn.parameter.Parameter(
            torch.rand(size=(n_hid,)), requires_grad=True
        )

        self.lambda_s = lambda_s
        self.lambda_2 = lambda_2

    def forward(self, z):
        x, reg_loss = z

        dist = torch.linalg.norm(self.u - self.v, ord=2, dim=2)
        w_hat = torch.maximum(torch.Tensor([0.0]), torch.Tensor([1.0]) - dist**2)

        sparse_reg_term = self.lambda_s * (torch.sum(w_hat**2) / 2)
        l2_reg_term = self.lambda_2 * (torch.sum(self.W**2) / 2)

        W_eff = self.W * w_hat
        y = torch.matmul(x, W_eff) + self.b
        y = self.activation(y)

        return y, reg_loss + sparse_reg_term + l2_reg_term


class LocalKernel(torch.nn.Module):
    def __init__(self, n_layers, n_u, n_hid, n_dim, activation, lambda_s, lambda_2):
        super(LocalKernel, self).__init__()

        self.hidden_layers = torch.nn.Sequential(
            *[LocalKernelLayer(n_u, n_hid, n_dim, activation, lambda_s, lambda_2)]
            + [
                LocalKernelLayer(n_hid, n_hid, n_dim, activation, lambda_s, lambda_2)
                for _ in range(n_layers - 1)
            ]
        )
        self.out_layer = LocalKernelLayer(
            n_hid, n_u, n_dim, lambda x: x, lambda_s, lambda_2
        )

    def forward(self, x):
        reg_loss = 0
        y, reg_loss = self.hidden_layers((x, reg_loss))
        y, reg_loss = self.out_layer((y, reg_loss))

        return y, reg_loss


class GlobalKernel(torch.nn.Module):
    def __init__(self, n_kernel, gk_size, dot_scale):
        super(GlobalKernel, self).__init__()
        self.gk_size = gk_size
        self.dot_scale = dot_scale

        conv_kernel_ = torch.empty(size=(n_kernel, self.gk_size**2))
        torch.nn.init.trunc_normal_(conv_kernel_, 0, 0.1)

        self.conv_kernel = torch.nn.parameter.Parameter(
            conv_kernel_, requires_grad=True
        )

    def forward(self, x):
        # Item (dim=1) based average pooling
        avg_pooling = torch.mean(x, dim=1)
        avg_pooling = torch.reshape(avg_pooling, (1, -1))

        gk = (
            torch.matmul(avg_pooling, self.conv_kernel) * self.dot_scale
        )  # Scaled dot product
        gk = torch.reshape(gk, (1, 1, self.gk_size, self.gk_size))

        return gk


def global_conv(x, W):
    x = torch.reshape(x, [1, 1, x.shape[0], x.shape[1]])

    conv2d = F.relu(F.conv2d(x, W, stride=1, padding="same"))

    return torch.reshape(conv2d, (conv2d.shape[2], conv2d.shape[3]))

### Class for GLocal Kernel Pretraining

Pretraining has a meaning stated in the original paper (stated at the top of current notebook). In a few words, this is a training of item-based AutoEncoder adapted for collaborative filtering.

In [None]:
class GLocalKPre(pl.LightningModule):
    def __init__(
        self,
        n_hid,
        n_dim,
        n_layers,
        lambda_2,  # l2 regularisation
        lambda_s,
        iter_p,  # optimisation
        n_u,
        lr: float = 0.1,
        trial: Optional[optuna.trial.Trial] = None,
        optim: Optional[str] = "lbfgs",
        scheduler: Optional[str] = "none",
        **kwargs,
    ):
        super().__init__()
        self.save_hyperparameters()

        self.iter_p = iter_p

        self.local_kernel = LocalKernel(
            n_layers, n_u, n_hid, n_dim, torch.sigmoid, lambda_s, lambda_2
        )
        self.lr = lr
        self.trial = trial
        self.optim = optim
        self.scheduler = scheduler

    def forward(self, x):
        return self.local_kernel(x)

    def training_step(self, batch, batch_idx):
        _, _, train_r, train_m, _, _ = batch

        pred_p, reg_losses = self(train_r)

        # L2 loss
        diff = train_m * (train_r - pred_p)
        sqE = torch.sum(diff**2) / 2
        loss_p = sqE + reg_losses

        return loss_p

    def validation_step(self, batch, batch_idx):
        _, _, train_r, train_m, test_r, test_m = batch

        pred_p, _ = self(train_r)

        error_train = (
            train_m * (torch.clip(pred_p, 1.0, 5.0) - train_r) ** 2
        ).sum() / torch.sum(train_m)
        train_rmse = torch.sqrt(error_train)

        error = (
            test_m * (torch.clip(pred_p, 1.0, 5.0) - test_r) ** 2
        ).sum() / torch.sum(test_m)
        test_rmse = torch.sqrt(error)

        self.log("pre_train_rmse", train_rmse)
        self.log("pre_test_rmse", test_rmse)
        if self.trial is not None:
            self.trial.report(test_rmse.item(), step=self.global_step)

    def configure_optimizers(self):
        if self.optim == "adam":
            optimizer = torch.optim.AdamW(self.local_kernel.parameters(), lr=self.lr)
        elif self.optim == "lbfgs":
            optimizer = torch.optim.LBFGS(
                self.local_kernel.parameters(),
                max_iter=self.iter_p,
                history_size=10,
                lr=self.lr,
            )
        elif self.optim == "sgd":
            optimizer = torch.optim.SGD(self.local_kernel.parameters(), lr=self.lr)
        else:
            raise ValueError(
                "Only adam, lbfgs, and sgd options are possible for optimizer."
            )

        if self.scheduler == "exponential":
            scheduler = torch.optim.lr_scheduler.ExponentialLR(
                optimizer=optimizer, gamma=0.995
            )
        elif self.scheduler == "reducelronplateau":
            scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(
                optimizer, patience=4, factor=0.5, min_lr=1e-3
            )
        elif self.scheduler == "none":
            return optimizer
        else:
            raise ValueError(f"Unkown lr scheduler: {self.scheduler}")

        return {
            "optimizer": optimizer,
            "lr_scheduler": {"scheduler": scheduler, "monitor": "pre_test_rmse"},
        }

### Class for GLocal Kernel Fine-tuning

Fine-tuning is the stage after Pre-training, again has meaning stated in the paper provided at the top of this notebook. In a few words, we fine-tune the pre-trained auto encoder with the rating matrix, produced by the global convolutional kernel.

In [None]:
class GLocalKFine(pl.LightningModule):
    def __init__(
        self,
        gk_size,
        iter_f,
        dot_scale,
        n_m,
        local_kernel_checkpoint,
        lr: float = 0.1,
        trial: Optional[optuna.trial.Trial] = None,
        optim: Optional[str] = "lbfgs",
        scheduler: Optional[str] = "none",
        *args,
        **kwargs,
    ):
        super().__init__()
        self.save_hyperparameters()

        self.iter_f = iter_f

        self.local_kernel = GLocalKPre.load_from_checkpoint(local_kernel_checkpoint)
        self.local_kernel.mode = "train"

        self.global_kernel = GlobalKernel(n_m, gk_size, dot_scale)
        self.lr = lr
        self.trial = trial
        self.optim = optim
        self.scheduler = scheduler

    def forward(self, x):
        y_dash, _ = self.local_kernel(x)

        gk = self.global_kernel(y_dash)
        y_hat = global_conv(x, gk)

        y, _ = self.local_kernel(y_hat)

        return y

    def training_step(self, batch, batch_idx):
        _, _, train_r, train_m, _, _ = batch

        y_dash, _ = self.local_kernel(train_r)

        gk = self.global_kernel(y_dash)  # Global kernel
        y_hat = global_conv(train_r, gk)  # Global kernel-based rating matrix

        pred_f, reg_losses = self.local_kernel(y_hat)

        # L2 loss
        diff = train_m * (train_r - pred_f)
        sqE = torch.sum(diff**2) / 2
        loss_f = sqE + reg_losses

        return loss_f

    def validation_step(self, batch, batch_idx):
        _, _, train_r, train_m, test_r, test_m = batch

        pred_f = self(train_r)

        error_train = (
            train_m * (torch.clip(pred_f, 1.0, 5.0) - train_r) ** 2
        ).sum() / torch.sum(train_m)
        train_rmse = torch.sqrt(error_train)

        error = (
            test_m * (torch.clip(pred_f, 1.0, 5.0) - test_r) ** 2
        ).sum() / torch.sum(test_m)
        test_rmse = torch.sqrt(error)

        self.log("train_rmse", train_rmse)
        self.log("test_rmse", test_rmse)
        self.log("fine_train_rmse", train_rmse)
        self.log("fine_test_rmse", test_rmse)
        if self.trial is not None:
            self.trial.report(test_rmse.item(), step=self.global_step)

    def configure_optimizers(self):
        if self.optim == "adam":
            optimizer = torch.optim.AdamW(self.parameters(), lr=self.lr)
        elif self.optim == "lbfgs":
            optimizer = torch.optim.LBFGS(
                self.parameters(), max_iter=self.iter_f, history_size=10, lr=self.lr
            )
        elif self.optim == "sgd":
            optimizer = torch.optim.SGD(self.parameters(), lr=self.lr)
        else:
            raise ValueError(
                "Only adam, lbfgs, and sgd options are possible for optimizer."
            )
        if self.scheduler == "exponential":
            scheduler = torch.optim.lr_scheduler.ExponentialLR(
                optimizer=optimizer, gamma=0.995
            )
        elif self.scheduler == "reducelronplateau":
            scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(
                optimizer, patience=4, factor=0.5, min_lr=1e-3
            )
        elif self.scheduler == "none":
            return optimizer
        else:
            raise ValueError(f"Unkown lr scheduler: {self.scheduler}")

        return {
            "optimizer": optimizer,
            "lr_scheduler": {"scheduler": scheduler, "monitor": "test_rmse"},
        }

### Train loop with configuration of parameters

For the sake of simplicity and code compactness, I have removed config setting by arguments and special classes. Instead, hyperparameters are moved inside the train function and can be set in-place.

Best checkpoints are saved and preserved in the local storage or cloud storage. This particular notebook is recommended to be run in Google Colab to resolve the problems with path naming

In [None]:
def train_glocal_k():
    experiment_dir = "/content/experiment"
    seed = 42
    train_size = 0.8

    # glocal config
    NUM_WORKERS = 2
    n_hid = 1000
    n_dim = 5
    n_layers = 3
    gk_size = 5
    lambda_2 = 20.  # l2 regularisation
    lambda_s = 0.006
    iter_p = 5  # optimization
    iter_f = 5
    epoch_p = 30
    epoch_f = 80
    dot_scale = 1.0  # scaled dot product
    lr_pre = 0.1
    lr_fine = 1.0
    optimizer = 'lbfgs'  # lbfgs, adam
    lr_scheduler = 'none'
    weight_decay = 0.

    # setup model directory
    model_pre = f"nhid-{n_hid}-ndim--{n_dim}-layers-{n_layers}-lambda2-{lambda_2}-lambdas-{lambda_s}-iterp-{iter_p}-iterf-{iter_f}-gk-{gk_size}-epochp-{epoch_p}-epochf-{epoch_f}-dots-{dot_scale}_"
    model_dir = Path(experiment_dir, f"{model_pre}/{time.time():.0f}")
    model_dir.mkdir(exist_ok=True, parents=True)
    model_dir.joinpath("results").mkdir()
    model_dir = model_dir.as_posix()

    print(f"Starting model training with following configuration: {model_pre}")
    cil_dataloader = CILDataLoader(file="u.data", num_workers=NUM_WORKERS)
    n_m, n_u, train_r, train_m, test_r, test_m = next(iter(cil_dataloader))
    logger = TensorBoardLogger(save_dir=experiment_dir, log_graph=True)

    glocal_k_pre = GLocalKPre(
        n_hid,
        n_dim,
        n_layers,
        lambda_2,
        lambda_s,
        iter_p,
        n_u,
        lr=lr_pre,
        optim=optimizer,
    )
    pretraining_checkpoint = pl.callbacks.ModelCheckpoint(
        dirpath=f"{experiment_dir}/checkpoints",
        filename="pretraining-{epoch}-{pre_train_rmse:.4f}-{pre_test_rmse:.4f}",
        monitor="pre_test_rmse",
        save_top_k=2,
        mode="min",
        save_last=True,
    )
    lr_monitor = pl.callbacks.LearningRateMonitor(logging_interval="epoch")
    pretraining_trainer = pl.Trainer(
        callbacks=[pretraining_checkpoint, lr_monitor],
        max_epochs=epoch_p,
        log_every_n_steps=1,
        logger=logger,
    )
    pretraining_trainer.fit(glocal_k_pre, cil_dataloader, cil_dataloader)
    pre_ckpt = f"{experiment_dir}/checkpoints/pre_last.ckpt"
    copy(pretraining_checkpoint.last_model_path, pre_ckpt)
    pre_ckpt = f"{experiment_dir}/checkpoints/pre_best.ckpt"
    copy(pretraining_checkpoint.best_model_path, pre_ckpt)

    glocal_k_fine = GLocalKFine(
        gk_size,
        iter_f,
        dot_scale,
        n_m,
        pre_ckpt,
        lr=lr_fine,
        optim=optimizer,
    )
    finetuning_checkpoint = pl.callbacks.ModelCheckpoint(
        dirpath=f"{experiment_dir}/checkpoints",
        filename="finetuning-{epoch}-{fine_train_rmse:.4f}-{fine_test_rmse:.4f}",
        monitor="fine_test_rmse",
        save_top_k=2,
        mode="min",
        save_last=True,
    )
    finetuning_trainer = pl.Trainer(
        callbacks=[finetuning_checkpoint, lr_monitor],
        max_epochs=epoch_f,
        log_every_n_steps=1,
        logger=logger,
    )
    finetuning_trainer.fit(glocal_k_fine, cil_dataloader, cil_dataloader)
    glocal_k_fine = GLocalKFine.load_from_checkpoint(
        finetuning_checkpoint.best_model_path
    )
    print(finetuning_checkpoint.best_model_path)
    glocal_k_fine.eval()
    pred = glocal_k_fine(train_r)
    return pred.detach().numpy()


if __name__ == "__main__":
    pred = train_glocal_k()
    print(pred)

Starting model training with following configuration: nhid-1000-ndim--5-layers-3-lambda2-20.0-lambdas-0.006-iterp-5-iterf-5-gk-5-epochp-30-epochf-80-dots-1.0_
       user_id  movie_id  rating  timestamp
0          196       242       3  881250949
1          186       302       3  891717742
2           22       377       1  878887116
3          244        51       2  880606923
4          166       346       1  886397596
...        ...       ...     ...        ...
99995      880       476       3  880175444
99996      716       204       5  879795543
99997      276      1090       1  874795795
99998       13       225       2  882399156
99999       12       203       3  879959583

[100000 rows x 4 columns]
data matrix loaded
num of users: 943
num of movies: 1682
num of training ratings: 90000
num of test ratings: 10000


INFO:pytorch_lightning.utilities.rank_zero:GPU available: False, used: False
INFO:pytorch_lightning.utilities.rank_zero:TPU available: False, using: 0 TPU cores
INFO:pytorch_lightning.utilities.rank_zero:IPU available: False, using: 0 IPUs
INFO:pytorch_lightning.utilities.rank_zero:HPU available: False, using: 0 HPUs
/usr/local/lib/python3.10/dist-packages/pytorch_lightning/callbacks/model_checkpoint.py:639: Checkpoint directory /content/experiment/checkpoints exists and is not empty.
INFO:pytorch_lightning.callbacks.model_summary:
  | Name         | Type        | Params
---------------------------------------------
0 | local_kernel | LocalKernel | 3.9 M 
---------------------------------------------
3.9 M     Trainable params
0         Non-trainable params
3.9 M     Total params
15.717    Total estimated model params size (MB)
/usr/local/lib/python3.10/dist-packages/pytorch_lightning/loggers/tensorboard.py:187: Could not log computational graph to TensorBoard: The `model.example_input

Sanity Checking: |          | 0/? [00:00<?, ?it/s]

Training: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

INFO:pytorch_lightning.utilities.rank_zero:`Trainer.fit` stopped: `max_epochs=30` reached.
INFO:pytorch_lightning.utilities.rank_zero:GPU available: False, used: False
INFO:pytorch_lightning.utilities.rank_zero:TPU available: False, using: 0 TPU cores
INFO:pytorch_lightning.utilities.rank_zero:IPU available: False, using: 0 IPUs
INFO:pytorch_lightning.utilities.rank_zero:HPU available: False, using: 0 HPUs
INFO:pytorch_lightning.callbacks.model_summary:
  | Name          | Type         | Params
-----------------------------------------------
0 | local_kernel  | GLocalKPre   | 3.9 M 
1 | global_kernel | GlobalKernel | 42.1 K
-----------------------------------------------
4.0 M     Trainable params
0         Non-trainable params
4.0 M     Total params
15.886    Total estimated model params size (MB)


Sanity Checking: |          | 0/? [00:00<?, ?it/s]

Training: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

INFO:pytorch_lightning.utilities.rank_zero:`Trainer.fit` stopped: `max_epochs=80` reached.


/content/experiment/checkpoints/finetuning-epoch=45-fine_train_rmse=0.8519-fine_test_rmse=0.9008.ckpt
[[4.0059905 4.2568555 3.1340024 ... 4.424846  4.502741  4.0456924]
 [3.0861082 3.2083066 2.820822  ... 3.6217697 4.054215  3.2734098]
 [3.2402241 2.8710153 2.6994681 ... 3.3428125 3.7055407 3.0599027]
 ...
 [2.6468992 2.440421  2.440467  ... 3.0072274 3.4871047 2.7200255]
 [3.260184  3.3465788 2.81361   ... 3.7196105 4.0412617 3.3759413]
 [3.2114627 2.8043063 2.5871005 ... 3.2775493 3.5686898 3.008305 ]]


Results must be transposed to enter the evaluation loop

In [None]:
pred_df = pd.DataFrame(pred.T)
pred_df = pred_df.rename(columns={i: i+1 for i in range(len(pred_df.columns))})
pred_df

Unnamed: 0,1,2,3,4,5,6,7,8,9,10,...,1673,1674,1675,1676,1677,1678,1679,1680,1681,1682
0,4.005991,3.086108,3.240224,3.980501,2.993020,4.269479,4.207281,4.291829,4.418986,4.443035,...,2.967778,4.282014,1.808277,1.781896,3.767418,2.430785,2.870395,2.646899,3.260184,3.211463
1,4.256855,3.208307,2.871015,3.807362,3.077477,3.658958,4.145213,4.438689,4.251235,3.970783,...,2.754118,3.997877,2.157640,2.117827,3.348250,2.283081,2.604122,2.440421,3.346579,2.804306
2,3.134002,2.820822,2.699468,3.008370,2.801006,2.954331,3.085762,3.131143,3.088783,3.023752,...,2.569132,3.004251,2.029599,2.011292,2.791868,2.333390,2.545782,2.440467,2.813610,2.587101
3,4.422172,4.128034,4.267406,4.478387,4.094547,4.651963,4.547891,4.557761,4.649403,4.698830,...,4.184725,4.624035,3.801483,3.794150,4.477344,4.011186,4.150034,4.079263,4.214275,4.293045
4,3.553265,2.627291,2.646999,3.377126,2.489323,3.538113,3.680927,3.913038,3.912798,3.800975,...,2.614701,3.758718,2.322354,2.301349,3.243538,2.280483,2.461011,2.364912,2.875470,2.776297
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
938,5.171717,4.692492,4.261760,4.781067,4.667677,4.468040,4.926148,5.064450,4.850947,4.614782,...,4.093287,4.661759,3.383661,3.348345,4.295417,3.755552,4.060174,3.910864,4.615919,4.012155
939,3.897785,3.152974,2.524166,3.269189,3.062154,2.799926,3.551462,3.872854,3.503666,3.087819,...,2.504571,3.236823,2.310861,2.270751,2.712972,2.206992,2.393566,2.300482,3.160379,2.369771
940,4.424846,3.621770,3.342813,4.063867,3.523465,3.929672,4.324199,4.552517,4.397397,4.170792,...,3.250216,4.196704,2.766967,2.734453,3.695185,2.881234,3.136908,3.007227,3.719610,3.277549
941,4.502741,4.054215,3.705541,4.157747,4.009626,3.900180,4.307186,4.462893,4.269136,4.050022,...,3.632825,4.112394,3.280588,3.253531,3.799841,3.393480,3.578744,3.487105,4.041262,3.568690


Evaluation is made on u1.test file, as in baseline, to compare results in terms of metrics

In [None]:
test = pd.read_csv('/content/u1.test', sep='\t', names=names)

We still use MAE and RMSE to measure how far our results from ground truth.

Thus, we will compare them with the baseline

In [None]:
results = {}
preds = []
for _, row in test.iterrows():
    user = int(row[user_col])
    item = int(row[movie_col])
    if item in pred_df.columns:
        preds.append(pred_df.iloc[user - 1][item])
    else:
        preds.append(np.mean(pred_df.iloc[user - 1]))
mae = mean_absolute_error(test['rating'], preds)
rmse = sqrt(mean_squared_error(test['rating'], preds))
results = {'mae': mae, 'rmse': rmse}
results

{'mae': 0.6731397234663368, 'rmse': 0.8607996970441505}

### Seems that hypothesis was approved!

Baseline solution has reached RMSE = 0.99326 on the test set.

Rating predictions of GLocal-K on the test set are more accurate than by baseline solution, resulting in

{'mae': 0.6731397234663368, 'rmse': 0.8607996970441505}

metrics values!

Thus, implementation of GLocal-K is an exellent approach for Recommender systems.



---

## Thank you for attention!
and for amazing assignment