<i>Copyright (c) Microsoft Corporation. All rights reserved.</i>

<i>Licensed under the MIT License.</i>

# Benchmark with Movielens dataset

This illustrative comparison applies to collaborative filtering algorithms available in this repository such as Spark ALS, Surprise SVD, SAR and others using the Movielens dataset. These algorithms are usable in a variety of recommendation tasks, including product or news recommendations.

The main purpose of this notebook is not to produce comprehensive benchmarking results on multiple datasets. Rather, it is intended to illustrate on how one could evaluate different recommender algorithms using tools in this repository.

## Experimentation setup:

* Objective
  * To compare how each collaborative filtering algorithm perform in predicting ratings and recommending relevant items.

* Environment
  * The comparison is run on a [Azure Data Science Virtual Machine](https://azure.microsoft.com/en-us/services/virtual-machines/data-science-virtual-machines/). 
  * The virtual machine size is Standard NC6 (6 vcpus, 55 GB memory, 1K80 GPU).
  * It should be noted that the single node DSVM is not supposed to run scalable benchmarking analysis. Either scaling up or out the computing instances is necessary to run the benchmarking in an run-time efficient way without any memory issue.
  * **NOTE ABOUT THE DEPENDENCIES TO INSTALL**: This notebook uses CPU, GPU and PySpark algorithms, so make sure you install the `full environment` as detailed in the [SETUP.md](../../SETUP.md). 
  
* Datasets
  * [Movielens 100K](https://grouplens.org/datasets/movielens/100k/).
  * [Movielens 1M](https://grouplens.org/datasets/movielens/1m/).

* Data split
  * The data is split into train and test sets.
  * The split ratios are 75-25 for train and test datasets.
  * The splitting is stratified based on items. 

* Model training
  * A recommendation model is trained by using each of the collaborative filtering algorithms. 
  * Empirical parameter values reported [here](http://mymedialite.net/examples/datasets.html) are used in this notebook.  More exhaustive hyper parameter tuning would be required to further optimize results.

* Evaluation metrics
  * Ranking metrics:
    * Precision@k.
    * Recall@k.
    * Normalized discounted cumulative gain@k (NDCG@k).
    * Mean-average-precision (MAP). 
    * In the evaluation metrics above, k = 10. 
  * Rating metrics:
    * Root mean squared error (RMSE).
    * Mean average error (MAE).
    * R squared.
    * Explained variance.
  * Run time performance
    * Elapsed for training a model and using a model for predicting/recommending k items. 
    * The time may vary across different machines. 

## Globals settings

In [1]:
import warnings
warnings.filterwarnings("ignore")

import sys
import os
import json
import pandas as pd
import numpy as np
import seaborn as sns
import pyspark
import torch
import fastai
import tensorflow as tf
tf.get_logger().setLevel('ERROR') # only show error messages
import surprise

from recommenders.utils.general_utils import get_number_processors
from recommenders.utils.gpu_utils import get_cuda_version, get_cudnn_version
from recommenders.datasets import movielens
from recommenders.datasets.python_splitters import python_stratified_split
from recommenders.models.fastai.fastai_utils import hide_fastai_progress_bar

from benchmark_utils import * 

print("System version: {}".format(sys.version))
print("Pandas version: {}".format(pd.__version__))
print("PySpark version: {}".format(pyspark.__version__))
print("Surprise version: {}".format(surprise.__version__))
print("PyTorch version: {}".format(torch.__version__))
print("Fast AI version: {}".format(fastai.__version__))
print("Cornac version: {}".format(cornac.__version__))
print("Tensorflow version: {}".format(tf.__version__))
print("CUDA version: {}".format(get_cuda_version()))
print("CuDNN version: {}".format(get_cudnn_version()))
n_cores = get_number_processors()
print("Number of cores: {}".format(n_cores))

%load_ext autoreload
%autoreload 2

System version: 3.8.13 (default, Mar 28 2022, 11:38:47) 
[GCC 7.5.0]
Pandas version: 1.5.0
PySpark version: 3.3.0
Surprise version: 1.1.2
PyTorch version: 1.12.1+cu102
Fast AI version: 1.0.61
Cornac version: 1.14.2
Tensorflow version: 2.7.4
CUDA version: No CUDA in this machine
CuDNN version: 7605
Number of cores: 6


## Parameters

In [2]:
# Hide fastai progress bar
hide_fastai_progress_bar()

In [3]:
# fix random seeds to make sure out runs are reproducible
np.random.seed(SEED)
torch.manual_seed(SEED)
torch.cuda.manual_seed_all(SEED)

In [4]:
environments = {
    "als": "pyspark",
    "sar": "python_cpu",
    "svd": "python_cpu",
    "fastai": "python_gpu",
    "ncf": "python_gpu",
    "bpr": "python_cpu",
    "bivae": "python_gpu",
    "lightgcn": "python_gpu",
}

metrics = {
    "als": ["rating", "ranking"],
    "sar": ["ranking"],
    "svd": ["rating", "ranking"],
    "fastai": ["rating", "ranking"],
    "ncf": ["ranking"],
    "bpr": ["ranking"],
    "bivae": ["ranking"],
    "lightgcn": ["ranking"]
}

Algorithm parameters

In [5]:
als_params = {
    "rank": 10,
    "maxIter": 15,
    "implicitPrefs": False,
    "alpha": 0.1,
    "regParam": 0.05,
    "coldStartStrategy": "drop",
    "nonnegative": False,
    "userCol": DEFAULT_USER_COL,
    "itemCol": DEFAULT_ITEM_COL,
    "ratingCol": DEFAULT_RATING_COL,
}

sar_params = {
    "similarity_type": "jaccard",
    "time_decay_coefficient": 30,
    "time_now": None,
    "timedecay_formula": True,
    "col_user": DEFAULT_USER_COL,
    "col_item": DEFAULT_ITEM_COL,
    "col_rating": DEFAULT_RATING_COL,
    "col_timestamp": DEFAULT_TIMESTAMP_COL,
}

svd_params = {
    "n_factors": 150,
    "n_epochs": 15,
    "lr_all": 0.005,
    "reg_all": 0.02,
    "random_state": SEED,
    "verbose": False
}

fastai_params = {
    "n_factors": 40, 
    "y_range": [0,5.5], 
    "wd": 1e-1,
    "max_lr": 5e-3,
    "epochs": 15
}

ncf_params = {
    "model_type": "NeuMF",
    "n_factors": 4,
    "layer_sizes": [16, 8, 4],
    "n_epochs": 15,
    "batch_size": 1024,
    "learning_rate": 1e-3,
    "verbose": 10
}

bpr_params = {
    "k": 200,
    "max_iter": 200,
    "learning_rate": 0.01,
    "lambda_reg": 1e-3,
    "seed": SEED,
    "verbose": False
}

bivae_params = {
    "k": 100,
    "encoder_structure": [200],
    "act_fn": "tanh",
    "likelihood": "pois",
    "n_epochs": 500,
    "batch_size": 1024,
    "learning_rate": 0.001,
    "seed": SEED,
    "use_gpu": True,
    "verbose": False
}

lightgcn_param = {
    "yaml_file": os.path.join("..","..","recommenders", "models", "deeprec", "config", "lightgcn.yaml"),
    "n_layers": 3,
    "batch_size": 1024,
    "epochs": 15,
    "learning_rate": 0.005,
    "eval_epoch": 5,
    "top_k": DEFAULT_K,
}

params = {
    "als": als_params,
    "sar": sar_params,
    "svd": svd_params,
    "fastai": fastai_params,
    "ncf": ncf_params,
    "bpr": bpr_params,
    "bivae": bivae_params,
    "lightgcn": lightgcn_param,
}

In [6]:
prepare_training_data = {
    "als": prepare_training_als,
    "sar": prepare_training_sar,
    "svd": prepare_training_svd,
    "fastai": prepare_training_fastai,
    "ncf": prepare_training_ncf,
    "bpr": prepare_training_cornac,
    "bivae": prepare_training_cornac,
    "lightgcn": prepare_training_lightgcn,
}

In [7]:
prepare_metrics_data = {
    "als": lambda train, test: prepare_metrics_als(train, test),
    "fastai": lambda train, test: prepare_metrics_fastai(train, test),    
}

In [8]:
trainer = {
    "als": lambda params, data: train_als(params, data),
    "svd": lambda params, data: train_svd(params, data),
    "sar": lambda params, data: train_sar(params, data), 
    "fastai": lambda params, data: train_fastai(params, data),
    "ncf": lambda params, data: train_ncf(params, data),
    "bpr": lambda params, data: train_bpr(params, data),
    "bivae": lambda params, data: train_bivae(params, data),
    "lightgcn": lambda params, data: train_lightgcn(params, data),
}

In [9]:
rating_predictor = {
    "als": lambda model, test: predict_als(model, test),
    "svd": lambda model, test: predict_svd(model, test),
    "fastai": lambda model, test: predict_fastai(model, test),
}

In [10]:
ranking_predictor = {
    "als": lambda model, test, train: recommend_k_als(model, test, train),
    "sar": lambda model, test, train: recommend_k_sar(model, test, train),
    "svd": lambda model, test, train: recommend_k_svd(model, test, train),
    "fastai": lambda model, test, train: recommend_k_fastai(model, test, train),
    "ncf": lambda model, test, train: recommend_k_ncf(model, test, train),
    "bpr": lambda model, test, train: recommend_k_cornac(model, test, train),
    "bivae": lambda model, test, train: recommend_k_cornac(model, test, train),
    "lightgcn": lambda model, test, train: recommend_k_lightgcn(model, test, train),
}

In [11]:
rating_evaluator = {
    "als": lambda test, predictions: rating_metrics_pyspark(test, predictions),
    "svd": lambda test, predictions: rating_metrics_python(test, predictions),
    "fastai": lambda test, predictions: rating_metrics_python(test, predictions)
}
    
    
ranking_evaluator = {
    "als": lambda test, predictions, k: ranking_metrics_pyspark(test, predictions, k),
    "sar": lambda test, predictions, k: ranking_metrics_python(test, predictions, k),
    "svd": lambda test, predictions, k: ranking_metrics_python(test, predictions, k),
    "fastai": lambda test, predictions, k: ranking_metrics_python(test, predictions, k),
    "ncf": lambda test, predictions, k: ranking_metrics_python(test, predictions, k),
    "bpr": lambda test, predictions, k: ranking_metrics_python(test, predictions, k),
    "bivae": lambda test, predictions, k: ranking_metrics_python(test, predictions, k),
    "lightgcn": lambda test, predictions, k: ranking_metrics_python(test, predictions, k),
}

In [12]:
def generate_summary(data, algo, k, train_time, time_rating, rating_metrics, time_ranking, ranking_metrics):
    summary = {"Data": data, "Algo": algo, "K": k, "Train time (s)": train_time, "Predicting time (s)": time_rating, "Recommending time (s)": time_ranking}
    if rating_metrics is None:
        rating_metrics = {
            "RMSE": np.nan,
            "MAE": np.nan,
            "R2": np.nan,
            "Explained Variance": np.nan,
        }
    if ranking_metrics is None:
        ranking_metrics = {
            "MAP": np.nan,
            "nDCG@k": np.nan,
            "Precision@k": np.nan,
            "Recall@k": np.nan,
        }
    summary.update(rating_metrics)
    summary.update(ranking_metrics)
    return summary

## Benchmark loop

In [13]:
data_sizes = ["100k", "1m"] # Movielens data size: 100k, 1m, 10m, or 20m
algorithms = ["als", "svd", "sar", "ncf", "fastai", "bpr", "bivae", "lightgcn"]

In [14]:
%%time

# For each data size and each algorithm, a recommender is evaluated. 
cols = ["Data", "Algo", "K", "Train time (s)", "Predicting time (s)", "RMSE", "MAE", "R2", "Explained Variance", "Recommending time (s)", "MAP", "nDCG@k", "Precision@k", "Recall@k"]
df_results = pd.DataFrame(columns=cols)

for data_size in data_sizes:
    # Load the dataset
    df = movielens.load_pandas_df(
        size=data_size,
        header=[DEFAULT_USER_COL, DEFAULT_ITEM_COL, DEFAULT_RATING_COL, DEFAULT_TIMESTAMP_COL]
    )
    print("Size of Movielens {}: {}".format(data_size, df.shape))
    
    # Split the dataset
    df_train, df_test = python_stratified_split(df,
                                                ratio=0.75, 
                                                min_rating=1, 
                                                filter_by="item", 
                                                col_user=DEFAULT_USER_COL, 
                                                col_item=DEFAULT_ITEM_COL
                                                )
   
    # Loop through the algos
    for algo in algorithms:
        print(f"\nComputing {algo} algorithm on Movielens {data_size}")
          
        # Data prep for training set
        train = prepare_training_data.get(algo, lambda x,y:(x,y))(df_train, df_test)
        
        # Get model parameters
        model_params = params[algo]
          
        # Train the model
        model, time_train = trainer[algo](model_params, train)
        print(f"Training time: {time_train}s")
                
        # Predict and evaluate
        train, test = prepare_metrics_data.get(algo, lambda x,y:(x,y))(df_train, df_test)
        
        if "rating" in metrics[algo]:   
            # Predict for rating
            preds, time_rating = rating_predictor[algo](model, test)
            print(f"Rating prediction time: {time_rating}s")
            
            # Evaluate for rating
            ratings = rating_evaluator[algo](test, preds)
        else:
            ratings = None
            time_rating = np.nan
        
        if "ranking" in metrics[algo]:
            # Predict for ranking
            top_k_scores, time_ranking = ranking_predictor[algo](model, test, train)
            print(f"Ranking prediction time: {time_ranking}s")
            
            # Evaluate for rating
            rankings = ranking_evaluator[algo](test, top_k_scores, DEFAULT_K)
        else:
            rankings = None
            time_ranking = np.nan
            
        # Record results
        summary = generate_summary(data_size, algo, DEFAULT_K, time_train, time_rating, ratings, time_ranking, rankings)
        df_results.loc[df_results.shape[0] + 1] = summary
        
print("\nComputation finished")
os.remove('./df_train.csv')
os.remove('./df_test.csv')

INFO:recommenders.datasets.download_utils:Downloading https://files.grouplens.org/datasets/movielens/ml-100k.zip
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████

Size of Movielens 100k: (100000, 4)

Computing als algorithm on Movielens 100k


22/07/17 01:38:47 WARN Utils: Your hostname, t-scheguru-vm resolves to a loopback address: 127.0.0.1; using 10.0.0.4 instead (on interface eth0)
22/07/17 01:38:47 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
22/07/17 01:38:48 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
                                                                                                                                                                                                                                                                                                                                                                                                                              

Training time: 8.1634s
Rating prediction time: 0.1024s


22/07/17 01:39:09 WARN Column: Constructing trivially true equals predicate, 'userID#45 = userID#45'. Perhaps you need to use aliases.


Ranking prediction time: 0.1098s


                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        


Computing svd algorithm on Movielens 100k
Training time: 4.9635s
Rating prediction time: 0.5603s
Ranking prediction time: 16.0013s


INFO:root:Collecting user affinity matrix
INFO:root:Calculating time-decayed affinities
INFO:root:Creating index columns
INFO:root:Building user affinity sparse matrix
INFO:root:Calculating item co-occurrence



Computing sar algorithm on Movielens 100k


INFO:root:Calculating item similarity
INFO:root:Using jaccard based similarity
INFO:root:Done training
INFO:root:Calculating recommendation scores
INFO:root:Removing seen items


Training time: 0.3957s
Ranking prediction time: 0.0959s

Computing ncf algorithm on Movielens 100k


INFO:recommenders.models.ncf.dataset:Indexing ./df_train.csv ...
INFO:recommenders.models.ncf.ncf_singlenode:Epoch 10 [4.36s]: train_loss = 0.300866 


Training time: 66.6714s
Ranking prediction time: 3.8576s

Computing fastai algorithm on Movielens 100k
█



█



█



█



█



█



█



█



█



█



█



█



█



█



Training time: 100.5388s
Rating prediction time: 0.0613s
Ranking prediction time: 3.7081s

Computing bpr algorithm on Movielens 100k
Training time: 6.7326s
Ranking prediction time: 3.8942s

Computing bivae algorithm on Movielens 100k
Training time: 25.5283s
Ranking prediction time: 1.7233s

Computing lightgcn algorithm on Movielens 100k
Already create adjacency matrix.
Already normalize adjacency matrix.
Using xavier initialization.
Epoch 1 (train)5.0s: train loss = 0.47409 = (mf)0.47385 + (embed)0.00024
Epoch 2 (train)4.8s: train loss = 0.29424 = (mf)0.29361 + (embed)0.00063
Epoch 3 (train)4.9s: train loss = 0.25435 = (mf)0.25355 + (embed)0.00080
Epoch 4 (train)4.9s: train loss = 0.23667 = (mf)0.23570 + (embed)0.00097
Epoch 5 (train)4.8s + (eval)0.3s: train loss = 0.22515 = (mf)0.22404 + (embed)0.00111, recall = 0.16311, ndcg = 0.35442, precision = 0.30668, map = 0.09457
Epoch 6 (train)4.9s: train loss = 0.22019 = (mf)0.21897 + (embed)0.00122
Epoch 7 (train)4.8s: train loss = 0.21037 

INFO:recommenders.datasets.download_utils:Downloading https://files.grouplens.org/datasets/movielens/ml-1m.zip
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████

Size of Movielens 1m: (1000209, 4)

Computing als algorithm on Movielens 1m


22/07/17 01:45:42 WARN TaskSetManager: Stage 516 contains a task of very large size (2759 KiB). The maximum recommended task size is 1000 KiB.
22/07/17 01:45:42 WARN TaskSetManager: Stage 517 contains a task of very large size (2759 KiB). The maximum recommended task size is 1000 KiB.


Training time: 6.2432s
Rating prediction time: 0.0246s


22/07/17 01:46:02 WARN Column: Constructing trivially true equals predicate, 'userID#803 = userID#803'. Perhaps you need to use aliases.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                

Ranking prediction time: 0.0658s


22/07/17 01:46:02 WARN TaskSetManager: Stage 917 contains a task of very large size (2759 KiB). The maximum recommended task size is 1000 KiB.
22/07/17 01:46:02 WARN TaskSetManager: Stage 918 contains a task of very large size (2759 KiB). The maximum recommended task size is 1000 KiB.
22/07/17 01:46:03 WARN TaskSetManager: Stage 954 contains a task of very large size (2759 KiB). The maximum recommended task size is 1000 KiB.                                                                                                                                                                                  (4 + 6) / 10][Stage 954:>                                                                                                                                                                                                                                                                                                                     (0 + 0) / 6][Stage 955:>                                    


Computing svd algorithm on Movielens 1m
Training time: 53.6244s
Rating prediction time: 3.4890s
Ranking prediction time: 218.9173s


INFO:root:Collecting user affinity matrix
INFO:root:Calculating time-decayed affinities



Computing sar algorithm on Movielens 1m


INFO:root:Creating index columns
INFO:root:Building user affinity sparse matrix
INFO:root:Calculating item co-occurrence
INFO:root:Calculating item similarity
INFO:root:Using jaccard based similarity
INFO:root:Done training
INFO:root:Calculating recommendation scores


Training time: 3.6958s


INFO:root:Removing seen items


Ranking prediction time: 2.8434s

Computing ncf algorithm on Movielens 1m


INFO:recommenders.models.ncf.dataset:Indexing ./df_train.csv ...
INFO:recommenders.models.ncf.ncf_singlenode:Epoch 10 [55.84s]: train_loss = 0.295724 


Training time: 835.6567s
Ranking prediction time: 55.1714s

Computing fastai algorithm on Movielens 1m
█



█



█



█



█



█



█



█



█



█



█



█



█



█



█



Training time: 746.2365s
Rating prediction time: 0.4255s
Ranking prediction time: 53.6986s

Computing bpr algorithm on Movielens 1m
Training time: 70.9279s
Ranking prediction time: 32.6486s

Computing bivae algorithm on Movielens 1m
Training time: 211.2357s
Ranking prediction time: 29.6397s

Computing lightgcn algorithm on Movielens 1m
Already create adjacency matrix.
Already normalize adjacency matrix.
Using xavier initialization.
Epoch 1 (train)445.9s: train loss = 0.34214 = (mf)0.34152 + (embed)0.00062
Epoch 2 (train)446.1s: train loss = 0.26896 = (mf)0.26753 + (embed)0.00143
Epoch 3 (train)446.1s: train loss = 0.22642 = (mf)0.22413 + (embed)0.00229
Epoch 4 (train)445.4s: train loss = 0.20359 = (mf)0.20059 + (embed)0.00300
Epoch 5 (train)445.8s + (eval)4.5s: train loss = 0.18634 = (mf)0.18267 + (embed)0.00367, recall = 0.12275, ndcg = 0.37516, precision = 0.33851, map = 0.07356
Epoch 6 (train)452.9s: train loss = 0.17314 = (mf)0.16883 + (embed)0.00431
Epoch 7 (train)451.9s: train lo

## Results

In [15]:
df_results

Unnamed: 0,Data,Algo,K,Train time (s),Predicting time (s),RMSE,MAE,R2,Explained Variance,Recommending time (s),MAP,nDCG@k,Precision@k,Recall@k
1,100k,als,10,8.1634,0.1024,0.959619,0.748871,0.264865,0.260029,0.1098,0.004015,0.040796,0.044857,0.015244
2,100k,svd,10,4.9635,0.5603,0.938681,0.74269,0.291967,0.291971,16.0013,0.012873,0.09593,0.091198,0.032783
3,100k,sar,10,0.3957,,,,,,0.0959,0.113028,0.388321,0.333828,0.183179
4,100k,ncf,10,66.6714,,,,,,3.8576,0.106871,0.395879,0.349205,0.183191
5,100k,fastai,10,100.5388,0.0613,0.941035,0.742262,0.288411,0.290805,3.7081,0.025521,0.147204,0.130753,0.054545
6,100k,bpr,10,6.7326,,,,,,3.8942,0.129946,0.437411,0.383669,0.209318
7,100k,bivae,10,25.5283,,,,,,1.7233,0.147895,0.47887,0.415164,0.221764
8,100k,lightgcn,10,73.531,,,,,,0.091,0.119846,0.414118,0.358324,0.192785
9,1m,als,10,6.2432,0.0246,0.861484,0.680317,0.41098,0.405023,0.0658,0.001999,0.023934,0.030247,0.009943
10,1m,svd,10,53.6244,3.489,0.883017,0.695366,0.37491,0.374911,218.9173,0.008828,0.08932,0.082856,0.021582
