# DSAIT4335 Recommender Systems
# Final Project

In this project, you will work to build different recommendation models and evaluate the effectiveness of these models through offline experiments. The dataset used for the experiments is **MovieLens100K**, a movie recommendation dataset collected by GroupLens: https://grouplens.org/datasets/movielens/100k/. For more details, check the project description on Brightspace.

# Instruction

The MovieLens100K is already splitted into 80% training and 20% test sets. Along with training and test sets, movies metadata as content information is also provided.

**Expected file structure** for this assignment:   
   
   ```
   RecSysProject/
   ├── training.txt
   ├── test.txt
   ├── movies.txt
   └── codes.ipynb
   ```

**Note:** Be sure to run all cells in each section sequentially, so that intermediate variables and packages are properly carried over to subsequent cells.

**Note** Be sure to run all cells such that the submitted file contains the output of each cell.

**Note** Feel free to add cells if you need more for answering a question.

**Submission:** Answer all the questions in this jupyter-notebook file. Submit this jupyter-notebook file (your answers included) to Brightspace. Change the name of this jupyter-notebook file to your group number: example, group10 -> 10.ipynb.

# Setup

In [None]:
!pip install transformers torch  # For BERT
!pip install -r requirements.txt
# you can refer https://huggingface.co/docs/transformers/en/model_doc/bert for various versions of the pre-trained model BERT

In [None]:
# For BERT embeddings (install: pip install transformers torch)
print("Check the status of BERT installation:")

try:
    from transformers import AutoTokenizer, AutoModel
    import torch
    BERT_AVAILABLE = True
    print("BERT libraries loaded successfully!")
    device = torch.device('cuda' if torch.cuda.is_available else 'cpu')
    print(f"Using device: {device}")
except ImportError:
    BERT_AVAILABLE = False
    print("BERT libraries not available. Install with: pip install transformers torch")

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.sparse import csr_matrix
from scipy.spatial.distance import cosine, correlation
from sklearn.metrics import mean_squared_error, mean_absolute_error
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity, euclidean_distances
from sklearn.preprocessing import StandardScaler, MultiLabelBinarizer
import re
import time, math
from tqdm import tqdm
import warnings
warnings.filterwarnings('ignore')

# Set random seed for reproducibility
np.random.seed(42)

print("Libraries imported successfully!")

# Load dataset

In [None]:
# loading the training set and test set
columns_name=['user_id','item_id','rating','timestamp']
train_data = pd.read_csv('data/training.txt', sep='\t', names=columns_name)
test_data = pd.read_csv('data/test.txt', sep='\t', names=columns_name)

print(f'The training data:')
display(train_data[['user_id','item_id','rating']].head())
print(f'The shape of the training data: {train_data.shape}')
print('--------------------------------')
print(f'The test data:')
display(test_data[['user_id','item_id','rating']].head())
print(f'The shape of the test data: {test_data.shape}')

In [None]:
movies = pd.read_csv('data/movies.txt',names=['item_id','title','genres','description'],sep='\t')
movies.head()

# Task 1) Implementation of different recommendation models as well as a hybrid model combining those recommendation models

<h3>Abstract Recommender</h3>

In [None]:
# TODO insert Abstract Recommender

To facilitate the implementation of the hybrid recommender system, we created an abstract recommender class. Each of the recommendation algorithms implemented in this task, extends this abstract recommender class and implements a method to train the algorithm and predict a score for a user/item pair. Furthermore, the class provides functionality to save and load predictions from a csv file to facilitate evaluation.

<h3>Content-Based Recommender</h1>

In [None]:
# TODO Insert Content-Based recommender

In [None]:
# TODO instantiate and give example of rating prediction (prediction, actual)

<i>Describe implementation and hyper parameters</i>

<h3>UserKNN</h3>

In [None]:
# TODO insert User-KNN

In [None]:
# TODO instantiate and give example of rating prediction (prediction, actual)

<i>Describe implementation and hyper parameters</i>

<h3>ItemKNN</h3>

In [None]:
# TODO insert Item-KNN

In [None]:
# TODO instantiate and give example of rating prediction (prediction, actual)

<i>Describe implementation and hyper parameters</i>

<h3>Matrix Factorization</h3>

In [None]:
# TODO insert Matrix-Factorization

In [None]:
# TODO instantiate and give example of rating prediction (prediction, actual)

<i>Describe implementation and hyper parameters</i>

<h3>Bayesian Probabilistic Ranking (BPR)</h3>

In [None]:
# TODO insert BPR

In [None]:
# TODO instantiate and give example of ranking prediction (prediction, actual)

<i>Describe implementation and hyper parameters</i>

<h3>Hybrid Model</h3>

In [None]:
# TODO insert hybrid model class

The hybrid model combines the predictions of the models implemented above into a single model by combining their predictions using a weighted sum approach. For the rating prediction task, the weights are found by minimizing an objective function, in our case the mean squared error (MSE). <i>TODO DEFEND WHY -> PRETTY SURE RMSE MINIMIZATION IS EQUIVALENT</i>. 

For the ranking task we use a slightly different approach:
1. Assume we want a recommendation list of size K.
2. For each recommendation we predict this list of item_ids and ratings.
3. Each rating for an item is multiplied by the algorithm's associated (predefined) weight to obtain new ratings for each item.
4. In the case that an item is recommended by multiple algorithms, the weighted ratings are summed together.
5. Finally, items are re-ranked by their new predicted rating and the top-K is taken as the new ranking.

As mentioned in the steps above, the weights for the ranking task are predefined, unlike the rating prediction task. This is because, as mentioned in the lectures, ranking evaluation metrics, such as NDCG and AP are non-smooth functions. Smooth approximations of these functions exist, but these approximations are not always good. Therefore, we opted for manually finding nearly optimal weights based on evaluation metrics, which is done in the next task.

In [None]:
# TODO instantiate recommender model and give example usage (rating and ranking

In [None]:
from recommendation_algorithms.hybrid_recommender import HybridRecommender

# Example usage
training_path = 'data/training.txt'
hybrid_recommender = HybridRecommender(training_path, True)

In [None]:
user_id = 1
item_id = 2
predicted_score = hybrid_recommender.predict_score(user_id, item_id)
actual_score = train_data.loc[((train_data['user_id'] == user_id) & (train_data['item_id'] == item_id)), 'rating'].values[0]
print(f'Predicted score {predicted_score} for user {user_id} and item {item_id}, actual score: {actual_score}.')

# Task 2) Experiments for both rating prediction and ranking tasks, and conducting offline evaluation

In task 2 we evaluate all individual models and the hybrid model for both rating prediction and ranking tasks by calculating evaluation metrics (implemented below) on the test set.

In [None]:
# TODO insert implementation of accuracy metrics for rating prediction and ranking

Below we evaluate how well each model performs by calculating the RMSE and discussing observations:

In [None]:
# TODO insert RMSE evaluation for all models

<i>Discuss observations</i>

Before we evaluate all models for the ranking task, we manually find suitable ranking weights for the hybrid model by attempting to minimize the F1-score (harmonic mean of Precision and Recall) and NDCG on the training set.

In [None]:
# TODO implement F1-score

In [None]:
# TODO try different ranking weights to optimize F1-score and NDCG on TRAINING SET as much as possible

Having found the weights the hybrid model should use for ranking, we now evaluate all models in terms of Precision, Recall, and NDCG and discuss our observations:

In [None]:
# TODO insert Precision, Recall, and NDCG evaluation for all models

<i>Discuss observations</i>

# Task 3) Implement baselines for both rating prediction and ranking tasks, and perform experiments with those baselines

In [None]:
<h3>Rating Baselines</h3>

In [None]:
# TODO insert average rater implementation

In [None]:
# TODO insert mean hybrid rater

In [None]:
# TODO evaluate all in terms of RMSE

<i>Discuss observations for rating baselines v.s. single models and hybrid model</i>

In [None]:
<h3>Ranking Baselines</h3>

In [None]:
# TODO insert random recommender

In [None]:
# TODO insert most popular recommender

In [None]:
# TODO insert mean hybrid ranker

In [None]:
# TODO evaluate all in terms of Precision, Recall, NDCG

<i>Discuss observations for ranking baselines v.s. single models and hybrid model</i>

# Task 4) Analysis of recommendation models. Analyzing the coefficients of hybrid model and the success of recommendation models for different users' groups. 

<i>Analyze the coefficients of regression model (hybrid model) for both rating prediction and ranking tasks -> Which models contribute the most to prediction</i>

<i>Where is each recommendation model successful in delivering accurate recommendation? -> For which user groups each recommendation model results in the highest accuracy?</i>

# Task 5) Evaluation of beyond accuracy

Apart from solely evaluating the models on accuracy metrics, we also look at the following non-accuracy metrics:
- Diversity (intra-list diversity)
- Novelty (surprisal)
- Calibration
- Fairness metrics
<i>Make list concrete with fairness metrics, maybe also discuss implementations</i>

In [None]:
# TODO add non-accuracy implementations

In [None]:
# TODO evaluate all models (single, hybrid, baselines) in terms of non-accuracy metrics

<i>Discuss observations, final remarks</i>