# Offline Evaluation of Recommender Systems with RecBole (v2)

**Objective:** This notebook performs an offline evaluation of several common recommendation algorithms on the MovieLens 100k dataset using the RecBole library. The evaluation covers both accuracy and beyond-accuracy metrics. Trained models will be saved for potential future use in inference. **Crucially, this notebook also ensures the processed dataset (with ID mappings) is saved for use in subsequent notebooks.**

**Dataset:** MovieLens 100k (ml-100k)

**Algorithms to Evaluate:**
1.  Random
2.  Pop (Most Popular)
3.  BPR (Bayesian Personalized Ranking)
4.  LightGCN
5.  ItemKNN (Item-based Collaborative Filtering)

**Evaluation Metrics:**
-   **Accuracy Metrics (Top-K):** Precision@K, Recall@K, NDCG@K, MRR@K (K will be set to 10)
-   **Beyond-Accuracy Metrics:**
    -   ItemCoverage@K (Coverage)
    -   GiniIndex@K (Gini-Coefficient for item distribution)
    -   ShannonEntropy@K (Shannon-Entropy for item distribution)
    -   AveragePopularity@K
    -   TailPercentage@K (Proportion of recommended items from the long tail)

**Methodology:**
-   Each algorithm will be run once using its default hyperparameters as defined in RecBole.
-   A standard ranking-oriented, random split evaluation setting will be used.
-   Trained models will be saved by RecBole in a specified checkpoint directory.
-   **The processed dataset will be saved to disk to allow loading its mappings in other notebooks.**

**CRITICAL PREREQUISITES & POTENTIAL MANUAL PATCHES:**

1.  **Install Libraries:**
    - `pip install recbole pandas pyyaml matplotlib seaborn lightgbm xgboost`
    - Ensure your Python, PyTorch, NumPy, and SciPy versions are compatible with your RecBole version. Downgrading NumPy to ~1.23.5 might be needed if you encounter `np.bool8` errors with Ray (a RecBole dependency). A common working combination for older RecBole versions might involve PyTorch 1.7-1.10, NumPy 1.23.5, SciPy 1.7-1.9.

2.  **Patch for `AttributeError: 'int' object has no attribute 'cpu'` (Random, Pop, ItemKNN):**
    - This error occurs in RecBole's `EvalCollector` on CPU because it tries to call `.cpu()` on non-Tensor objects (like Python integers) returned by some metrics.
    - **You MUST manually patch the file `recbole/evaluator/collector.py` in your RecBole installation.**
    - Find the `get_data_struct` method.
    - **Replace the loop:**
      ```python
      # Original problematic loop in get_data_struct:
      # for key in self.data_struct._data_dict:
      #     self.data_struct._data_dict[key] = self.data_struct._data_dict[key].cpu()
      ```
    - **With this patched loop:**
      ```python
      # Patched loop for get_data_struct in recbole/evaluator/collector.py:
      # import copy # Ensure copy is imported at the top of collector.py if not already
      for key in self.data_struct._data_dict:
          try:
              # Attempt to move to CPU if it's a tensor
              self.data_struct._data_dict[key] = self.data_struct._data_dict[key].cpu()
          except AttributeError:
              # If it's not a tensor (e.g., an int), just pass, leaving it as is
              pass
      ```
    - This ensures that the `.cpu()` call is only attempted on objects that actually have this method (i.e., PyTorch Tensors).

3.  **Patch for `AttributeError: 'dok_matrix' object has no attribute '_update'` (LightGCN):**
    - This is often due to SciPy version incompatibilities (typically SciPy >= 1.8.0) with how LightGCN constructs its adjacency matrix using `dok_matrix`.
    - **You may need to manually patch the file `recbole/model/general_recommender/lightgcn.py` (or similar path depending on your RecBole version).**
    - In the `get_norm_adj_mat` method (or where the adjacency matrix `A` is built), look for lines like `A._update(...)`.
    - **Replace calls like `A._update(data_dict)` with direct item assignment for `dok_matrix`.**
      For example, if `data_dict` contains `{(row, col): value, ...}`:
      ```python
      # Instead of:
      # A._update(user_item_data_dict)
      # A._update(item_user_data_dict)
      # Use:
      for (r, c), val in user_item_data_dict.items():
          A[r, c] = val
      for (r, c), val in item_user_data_dict.items(): # If there's a separate one for transpose
          A[r, c] = val
      # Or, more directly from interaction data if inter_M is a COO matrix:
      # A = sp.dok_matrix(...)
      # for i in range(inter_M.nnz):
      #     row = inter_M.row[i]
      #     col = inter_M.col[i]
      #     A[row, col + self.n_users] = 1.0
      #     A[col + self.n_users, row] = 1.0
      ```
    - *Consult RecBole GitHub issues regarding LightGCN and SciPy `dok_matrix` errors for the exact lines to modify for your RecBole version, as the internal implementation details can vary.*

**Without these patches and correct dependencies, some models are likely to fail with the errors you've observed.** This notebook provides the RecBole configurations; applying library patches and ensuring a compatible environment are crucial setup steps. Consider using a dedicated virtual environment for RecBole to manage dependencies effectively.


In [24]:
import os
import yaml
import pandas as pd
from datetime import datetime
import torch 

from recbole.config import Config
from recbole.data import create_dataset, data_preparation
from recbole.utils import init_seed, init_logger, get_model, get_trainer
from recbole.quick_start import run_recbole 

print("RecBole and other libraries imported successfully.")
print(f"PyTorch version: {torch.__version__}")
try:
    import scipy
    print(f"SciPy version: {scipy.__version__}")
except ImportError:
    print("SciPy not found, which might be an issue for some models like LightGCN.")
try:
    import numpy
    print(f"NumPy version: {numpy.__version__}")
except ImportError:
    print("NumPy not found.")

RecBole and other libraries imported successfully.
PyTorch version: 2.5.1+cu124
SciPy version: 1.14.1
NumPy version: 1.23.5


## 2. Configuration

Define paths, algorithm list, metrics, and general RecBole settings.

In [31]:
current_working_dir = os.getcwd()
print(f"Current working directory (os.getcwd()): {current_working_dir}")
if os.path.basename(current_working_dir).lower() == "notebooks":
    PROJECT_ROOT = os.path.abspath(os.path.join(current_working_dir, ".."))
else:
    PROJECT_ROOT = current_working_dir 
print(f"PROJECT_ROOT set to: {PROJECT_ROOT}")

DATA_PATH = os.path.join(PROJECT_ROOT, "recbole_data") 
SAVED_MODELS_PATH = os.path.join(PROJECT_ROOT, "recbole_saved_models")
RESULTS_PATH = os.path.join(PROJECT_ROOT, "recbole_results")

for path in [DATA_PATH, SAVED_MODELS_PATH, RESULTS_PATH]:
    if not os.path.exists(path):
        os.makedirs(path)
        print(f"Created directory: {path}")

# --- Experiment Parameters ---
DATASET_NAME = 'ml-100k'
ALGORITHMS_TO_RUN = [
    'Random', 
    'Pop', 
    'BPR',  # Changed from BPRMF
    'LightGCN', 
    'ItemKNN'
]
TOP_K = [10] 
METRICS = [
    "Recall", "Precision", "NDCG", "MRR", 
    "ItemCoverage", "GiniIndex", "ShannonEntropy", 
    "AveragePopularity", "TailPercentage" 
] 
VALID_METRIC = 'NDCG@10' 
GPU_ID = "-1" 
SEED = 2024

print(f"Data will be stored/looked for in: {DATA_PATH}")
print(f"Saved models will be stored in: {SAVED_MODELS_PATH}")
print(f"Evaluation results will be stored in: {RESULTS_PATH}")

Current working directory (os.getcwd()): /mnt/c/Users/tduricic/Development/workspace/conversational-reco/notebooks
PROJECT_ROOT set to: /mnt/c/Users/tduricic/Development/workspace/conversational-reco
Data will be stored/looked for in: /mnt/c/Users/tduricic/Development/workspace/conversational-reco/recbole_data
Saved models will be stored in: /mnt/c/Users/tduricic/Development/workspace/conversational-reco/recbole_saved_models
Evaluation results will be stored in: /mnt/c/Users/tduricic/Development/workspace/conversational-reco/recbole_results


## 3. Base RecBole Configuration

Define a base configuration dictionary that will be common across all algorithm runs.
This includes dataset specifics, evaluation settings, and metrics.

In [30]:
base_config_dict = {
    # General
    'reproducible': True,
    'seed': SEED,
    'show_progress': True, 
    'gpu_id': GPU_ID, 
    'checkpoint_dir': SAVED_MODELS_PATH, 
    'state': 'INFO', 
    
    # Data
    'dataset': DATASET_NAME,
    'data_path': DATA_PATH,
    'save_dataset': True,  # Ensure processed dataset is saved
    'USER_ID_FIELD': 'user_id',
    'ITEM_ID_FIELD': 'item_id',
    'RATING_FIELD': 'rating',
    'TIME_FIELD': 'timestamp',
    'load_col': {
        'inter': ['user_id', 'item_id', 'rating', 'timestamp']
    },
    'threshold': {'rating': 3}, 

    # Evaluation
    'eval_setting': 'RO_RS,full', 
    'split_ratio': [0.8, 0.1, 0.1], 
    'leave_one_out': False, 
    'metrics': METRICS,
    'topk': TOP_K,
    'valid_metric': VALID_METRIC.upper(), 
    'eval_batch_size': 2048, 

    # Training 
    'epochs': 50, 
    'stopping_step': 10, 
    'train_batch_size': 1024, 
    'learner': 'adam',
    'learning_rate': 0.001,
    # 'train_neg_sample_args' will be set per model if needed
}

print("Base RecBole configuration dictionary defined with 'save_dataset: True'.")

Base RecBole configuration dictionary defined with 'save_dataset: True'.


## 4. Experiment Loop

Iterate through the specified algorithms, configure RecBole for each, run the evaluation,
and collect the results.

In [32]:
all_results = []
current_timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")

for model_name in ALGORITHMS_TO_RUN:
    print(f"\n{'='*20} Running Experiment for: {model_name} {'='*20}")
    
    model_config_dict_for_run = base_config_dict.copy() 
    model_config_dict_for_run['model'] = model_name
    
    model_checkpoint_dir = os.path.join(SAVED_MODELS_PATH, DATASET_NAME, f"{model_name}_{current_timestamp}")
    model_config_dict_for_run['checkpoint_dir'] = model_checkpoint_dir
    if not os.path.exists(model_checkpoint_dir):
        os.makedirs(model_checkpoint_dir)
        print(f"Created checkpoint directory for {model_name}: {model_checkpoint_dir}")

    # Model-specific config adjustments
    if model_name in ['LightGCN', 'BPR']: 
        model_config_dict_for_run['epochs'] = 100 
    if model_name in ['ItemKNN', 'Pop', 'Random']: 
        model_config_dict_for_run['epochs'] = 1 
        for param_to_remove in ['learner', 'learning_rate', 'stopping_step', 'train_neg_sample_args']:
            if param_to_remove in model_config_dict_for_run:
                del model_config_dict_for_run[param_to_remove]
    
    if model_name == 'BPR':
        model_config_dict_for_run['train_neg_sample_args'] = {
            'distribution': 'uniform',
            'sample_num': 1, 
        }
        print(f"Applied specific train_neg_sample_args for BPR: {model_config_dict_for_run['train_neg_sample_args']}")
    elif 'train_neg_sample_args' in model_config_dict_for_run: 
        del model_config_dict_for_run['train_neg_sample_args']

    model_config_dict_for_run['gpu_id'] = GPU_ID 
    print(f"For {model_name}, using config with GPU ID: {model_config_dict_for_run['gpu_id']}")
    
    # Initialize RecBole Config object for init_seed and init_logger
    # This step also validates the configuration parameters.
    try:
        # We create a Config object here mainly for init_seed and potentially init_logger
        # The run_recbole function will internally create its own Config from the dict.
        config_for_initialization = Config(model=model_name, dataset=DATASET_NAME, config_dict=model_config_dict_for_run)
        init_seed(config_for_initialization['seed'], config_for_initialization['reproducible'])
        # init_logger(config_for_initialization) # Can be verbose
        save_dataset_status = config_for_initialization['save_dataset'] if 'save_dataset' in config_for_initialization else 'Not Set in Config Object'
        print(f"Config for {model_name} processed for initialization. 'save_dataset' is set to: {save_dataset_status}")
    except Exception as e:
        print(f"!!!!! Error initializing Config object for {model_name} (for seed/logger): {type(e).__name__} - {e} !!!!!")
        all_results.append({'model': model_name, 'error': f"Config Init Error (for seed/logger): {e}"})
        continue 

    try:
        print(f"Starting RecBole run for {model_name}...")
        # Pass model_name, dataset_name, and the specific config_dict for this run
        # run_recbole will handle creating its own Config object from these.
        results = run_recbole(
            model=model_name, 
            dataset=DATASET_NAME, 
            config_dict=model_config_dict_for_run 
        )
        
        print(f"\n--- Results for {model_name} ---")
        current_run_results = {'model': model_name}
        
        test_metrics_dict = results.get('test_result', results) 

        for metric_template in METRICS:
            output_key_for_df = ""
            expected_key_in_results = ""

            if metric_template not in ["ItemCoverage", "GiniIndex", "ShannonEntropy", "AveragePopularity", "TailPercentage"]:
                expected_key_in_results = f"{metric_template.lower()}@{TOP_K[0]}"
                output_key_for_df = f"{metric_template}@{TOP_K[0]}"
            else: 
                expected_key_in_results_with_k = f"{metric_template.lower()}@{TOP_K[0]}"
                expected_key_in_results_no_k = metric_template.lower()
                output_key_for_df = metric_template 

                if expected_key_in_results_with_k in test_metrics_dict:
                    expected_key_in_results = expected_key_in_results_with_k
                elif expected_key_in_results_no_k in test_metrics_dict:
                    expected_key_in_results = expected_key_in_results_no_k
                else: 
                    expected_key_in_results = expected_key_in_results_with_k 
            
            current_run_results[output_key_for_df] = test_metrics_dict.get(expected_key_in_results, pd.NA)

        print(f"Processed results for {model_name}: {current_run_results}")
        all_results.append(current_run_results)
        
        print(f"Model for {model_name} and its dataset object should be saved in: {model_config_dict_for_run['checkpoint_dir']}")
        # Verify if dataset files were created/saved
        # The config object used by run_recbole internally would be the one to check for data_path and dataset name
        dataset_save_path = os.path.join(model_config_dict_for_run['data_path'], model_config_dict_for_run['dataset'])
        saved_dataset_file = os.path.join(dataset_save_path, f"{model_config_dict_for_run['dataset']}.dataset")
        if os.path.exists(saved_dataset_file):
            print(f"Verified: Processed dataset file found at {saved_dataset_file}")
        else:
            print(f"Warning: Processed dataset file NOT found at {saved_dataset_file}. Check 'save_dataset' config and RecBole logs.")
        
        print(f"Full evaluation output from RecBole for {model_name}:")
        for key, value in results.items():
            print(f"  {key}: {value}")

    except Exception as e:
        print(f"!!!!! Error running {model_name}: {type(e).__name__} - {e} !!!!!")
        import traceback
        traceback.print_exc() 
        all_results.append({'model': model_name, 'error': str(e)})

print(f"\n{'='*20} All Experiments Finished {'='*20}")


Created checkpoint directory for Random: /mnt/c/Users/tduricic/Development/workspace/conversational-reco/recbole_saved_models/ml-100k/Random_20250520_230455
For Random, using config with GPU ID: -1
Config for Random processed for initialization. 'save_dataset' is set to: True
Starting RecBole run for Random...


20 May 23:04    INFO  ['/home/tduricic/anaconda3/envs/llm-eval/lib/python3.10/site-packages/ipykernel_launcher.py', '-f', '/home/tduricic/.local/share/jupyter/runtime/kernel-589123a3-c9d4-481e-bbed-ff83ae1d1cbf.json']
20 May 23:04    INFO  
General Hyper Parameters:
gpu_id = -1
use_gpu = True
seed = 2024
state = INFO
reproducibility = True
data_path = /home/tduricic/anaconda3/envs/llm-eval/lib/python3.10/site-packages/recbole/config/../dataset_example/ml-100k
checkpoint_dir = /mnt/c/Users/tduricic/Development/workspace/conversational-reco/recbole_saved_models/ml-100k/Random_20250520_230455
show_progress = True
save_dataset = True
dataset_save_path = None
save_dataloaders = False
dataloaders_save_path = None
log_wandb = False

Training Hyper Parameters:
epochs = 1
train_batch_size = 1024
learner = adam
learning_rate = 0.001
train_neg_sample_args = {'distribution': 'uniform', 'sample_num': 1, 'alpha': 1.0, 'dynamic': False, 'candidate_num': 0}
eval_step = 1
stopping_step = 10
clip_grad_n


--- Results for Random ---
Processed results for Random: {'model': 'Random', 'Recall@10': 0.0061, 'Precision@10': 0.0067, 'NDCG@10': 0.007, 'MRR@10': 0.0156, 'ItemCoverage': 0.9982, 'GiniIndex': 0.2374, 'ShannonEntropy': 0.0044, 'AveragePopularity': 43.2182, 'TailPercentage': 0.1064}
Model for Random and its dataset object should be saved in: /mnt/c/Users/tduricic/Development/workspace/conversational-reco/recbole_saved_models/ml-100k/Random_20250520_230455
Full evaluation output from RecBole for Random:
  best_valid_score: 0.0086
  valid_score_bigger: True
  best_valid_result: OrderedDict([('recall@10', 0.0073), ('precision@10', 0.0077), ('ndcg@10', 0.0086), ('mrr@10', 0.0188), ('itemcoverage@10', 0.9923), ('giniindex@10', 0.2469), ('shannonentropy@10', 0.0044), ('averagepopularity@10', 43.2523), ('tailpercentage@10', 0.1013)])
  test_result: OrderedDict([('recall@10', 0.0061), ('precision@10', 0.0067), ('ndcg@10', 0.007), ('mrr@10', 0.0156), ('itemcoverage@10', 0.9982), ('giniindex@1

20 May 23:05    INFO  ['/home/tduricic/anaconda3/envs/llm-eval/lib/python3.10/site-packages/ipykernel_launcher.py', '-f', '/home/tduricic/.local/share/jupyter/runtime/kernel-589123a3-c9d4-481e-bbed-ff83ae1d1cbf.json']
20 May 23:05    INFO  
General Hyper Parameters:
gpu_id = -1
use_gpu = True
seed = 2024
state = INFO
reproducibility = True
data_path = /home/tduricic/anaconda3/envs/llm-eval/lib/python3.10/site-packages/recbole/config/../dataset_example/ml-100k
checkpoint_dir = /mnt/c/Users/tduricic/Development/workspace/conversational-reco/recbole_saved_models/ml-100k/Pop_20250520_230455
show_progress = True
save_dataset = True
dataset_save_path = None
save_dataloaders = False
dataloaders_save_path = None
log_wandb = False

Training Hyper Parameters:
epochs = 1
train_batch_size = 1024
learner = adam
learning_rate = 0.001
train_neg_sample_args = {'distribution': 'uniform', 'sample_num': 1, 'alpha': 1.0, 'dynamic': False, 'candidate_num': 0}
eval_step = 1
stopping_step = 10
clip_grad_norm


--- Results for Pop ---
Processed results for Pop: {'model': 'Pop', 'Recall@10': 0.1147, 'Precision@10': 0.092, 'NDCG@10': 0.1262, 'MRR@10': 0.2274, 'ItemCoverage': 0.0392, 'GiniIndex': 0.9876, 'ShannonEntropy': 0.0499, 'AveragePopularity': 346.8551, 'TailPercentage': 0.0}
Model for Pop and its dataset object should be saved in: /mnt/c/Users/tduricic/Development/workspace/conversational-reco/recbole_saved_models/ml-100k/Pop_20250520_230455
Full evaluation output from RecBole for Pop:
  best_valid_score: 0.1196
  valid_score_bigger: True
  best_valid_result: OrderedDict([('recall@10', 0.1132), ('precision@10', 0.0893), ('ndcg@10', 0.1196), ('mrr@10', 0.2145), ('itemcoverage@10', 0.0297), ('giniindex@10', 0.989), ('shannonentropy@10', 0.063), ('averagepopularity@10', 354.6784), ('tailpercentage@10', 0.0)])
  test_result: OrderedDict([('recall@10', 0.1147), ('precision@10', 0.092), ('ndcg@10', 0.1262), ('mrr@10', 0.2274), ('itemcoverage@10', 0.0392), ('giniindex@10', 0.9876), ('shannonen

20 May 23:05    INFO  ['/home/tduricic/anaconda3/envs/llm-eval/lib/python3.10/site-packages/ipykernel_launcher.py', '-f', '/home/tduricic/.local/share/jupyter/runtime/kernel-589123a3-c9d4-481e-bbed-ff83ae1d1cbf.json']
20 May 23:05    INFO  
General Hyper Parameters:
gpu_id = -1
use_gpu = True
seed = 2024
state = INFO
reproducibility = True
data_path = /home/tduricic/anaconda3/envs/llm-eval/lib/python3.10/site-packages/recbole/config/../dataset_example/ml-100k
checkpoint_dir = /mnt/c/Users/tduricic/Development/workspace/conversational-reco/recbole_saved_models/ml-100k/BPR_20250520_230455
show_progress = True
save_dataset = True
dataset_save_path = None
save_dataloaders = False
dataloaders_save_path = None
log_wandb = False

Training Hyper Parameters:
epochs = 100
train_batch_size = 1024
learner = adam
learning_rate = 0.001
train_neg_sample_args = {'distribution': 'uniform', 'sample_num': 1, 'alpha': 1.0, 'dynamic': False, 'candidate_num': 0}
eval_step = 1
stopping_step = 10
clip_grad_no


--- Results for BPR ---
Processed results for BPR: {'model': 'BPR', 'Recall@10': 0.2453, 'Precision@10': 0.1965, 'NDCG@10': 0.2849, 'MRR@10': 0.4655, 'ItemCoverage': 0.372, 'GiniIndex': 0.8851, 'ShannonEntropy': 0.0088, 'AveragePopularity': 216.3971, 'TailPercentage': 0.0}
Model for BPR and its dataset object should be saved in: /mnt/c/Users/tduricic/Development/workspace/conversational-reco/recbole_saved_models/ml-100k/BPR_20250520_230455
Full evaluation output from RecBole for BPR:
  best_valid_score: 0.2401
  valid_score_bigger: True
  best_valid_result: OrderedDict([('recall@10', 0.2226), ('precision@10', 0.1636), ('ndcg@10', 0.2401), ('mrr@10', 0.4044), ('itemcoverage@10', 0.3512), ('giniindex@10', 0.8967), ('shannonentropy@10', 0.0092), ('averagepopularity@10', 225.8088), ('tailpercentage@10', 0.0)])
  test_result: OrderedDict([('recall@10', 0.2453), ('precision@10', 0.1965), ('ndcg@10', 0.2849), ('mrr@10', 0.4655), ('itemcoverage@10', 0.372), ('giniindex@10', 0.8851), ('shannon

20 May 23:06    INFO  ['/home/tduricic/anaconda3/envs/llm-eval/lib/python3.10/site-packages/ipykernel_launcher.py', '-f', '/home/tduricic/.local/share/jupyter/runtime/kernel-589123a3-c9d4-481e-bbed-ff83ae1d1cbf.json']
20 May 23:06    INFO  
General Hyper Parameters:
gpu_id = -1
use_gpu = True
seed = 2024
state = INFO
reproducibility = True
data_path = /home/tduricic/anaconda3/envs/llm-eval/lib/python3.10/site-packages/recbole/config/../dataset_example/ml-100k
checkpoint_dir = /mnt/c/Users/tduricic/Development/workspace/conversational-reco/recbole_saved_models/ml-100k/LightGCN_20250520_230455
show_progress = True
save_dataset = True
dataset_save_path = None
save_dataloaders = False
dataloaders_save_path = None
log_wandb = False

Training Hyper Parameters:
epochs = 100
train_batch_size = 1024
learner = adam
learning_rate = 0.001
train_neg_sample_args = {'distribution': 'uniform', 'sample_num': 1, 'alpha': 1.0, 'dynamic': False, 'candidate_num': 0}
eval_step = 1
stopping_step = 10
clip_gr


--- Results for LightGCN ---
Processed results for LightGCN: {'model': 'LightGCN', 'Recall@10': 0.2465, 'Precision@10': 0.1965, 'NDCG@10': 0.2868, 'MRR@10': 0.4661, 'ItemCoverage': 0.29, 'GiniIndex': 0.9122, 'ShannonEntropy': 0.0108, 'AveragePopularity': 228.5819, 'TailPercentage': 0.0}
Model for LightGCN and its dataset object should be saved in: /mnt/c/Users/tduricic/Development/workspace/conversational-reco/recbole_saved_models/ml-100k/LightGCN_20250520_230455
Full evaluation output from RecBole for LightGCN:
  best_valid_score: 0.2374
  valid_score_bigger: True
  best_valid_result: OrderedDict([('recall@10', 0.2233), ('precision@10', 0.1633), ('ndcg@10', 0.2374), ('mrr@10', 0.3938), ('itemcoverage@10', 0.2602), ('giniindex@10', 0.9225), ('shannonentropy@10', 0.0118), ('averagepopularity@10', 238.3627), ('tailpercentage@10', 0.0)])
  test_result: OrderedDict([('recall@10', 0.2465), ('precision@10', 0.1965), ('ndcg@10', 0.2868), ('mrr@10', 0.4661), ('itemcoverage@10', 0.29), ('ginii

20 May 23:15    INFO  ['/home/tduricic/anaconda3/envs/llm-eval/lib/python3.10/site-packages/ipykernel_launcher.py', '-f', '/home/tduricic/.local/share/jupyter/runtime/kernel-589123a3-c9d4-481e-bbed-ff83ae1d1cbf.json']
20 May 23:15    INFO  
General Hyper Parameters:
gpu_id = -1
use_gpu = True
seed = 2024
state = INFO
reproducibility = True
data_path = /home/tduricic/anaconda3/envs/llm-eval/lib/python3.10/site-packages/recbole/config/../dataset_example/ml-100k
checkpoint_dir = /mnt/c/Users/tduricic/Development/workspace/conversational-reco/recbole_saved_models/ml-100k/ItemKNN_20250520_230455
show_progress = True
save_dataset = True
dataset_save_path = None
save_dataloaders = False
dataloaders_save_path = None
log_wandb = False

Training Hyper Parameters:
epochs = 1
train_batch_size = 1024
learner = adam
learning_rate = 0.001
train_neg_sample_args = {'distribution': 'uniform', 'sample_num': 1, 'alpha': 1.0, 'dynamic': False, 'candidate_num': 0}
eval_step = 1
stopping_step = 10
clip_grad_


--- Results for ItemKNN ---
Processed results for ItemKNN: {'model': 'ItemKNN', 'Recall@10': 0.2454, 'Precision@10': 0.1951, 'NDCG@10': 0.2816, 'MRR@10': 0.4614, 'ItemCoverage': 0.249, 'GiniIndex': 0.9129, 'ShannonEntropy': 0.0126, 'AveragePopularity': 215.4427, 'TailPercentage': 0.0}
Model for ItemKNN and its dataset object should be saved in: /mnt/c/Users/tduricic/Development/workspace/conversational-reco/recbole_saved_models/ml-100k/ItemKNN_20250520_230455
Full evaluation output from RecBole for ItemKNN:
  best_valid_score: 0.2262
  valid_score_bigger: True
  best_valid_result: OrderedDict([('recall@10', 0.2074), ('precision@10', 0.1587), ('ndcg@10', 0.2262), ('mrr@10', 0.3874), ('itemcoverage@10', 0.2377), ('giniindex@10', 0.9202), ('shannonentropy@10', 0.013), ('averagepopularity@10', 222.732), ('tailpercentage@10', 0.0)])
  test_result: OrderedDict([('recall@10', 0.2454), ('precision@10', 0.1951), ('ndcg@10', 0.2816), ('mrr@10', 0.4614), ('itemcoverage@10', 0.249), ('giniindex@1

## 5. Collect and Display Results

Aggregate the evaluation results from all algorithms into a Pandas DataFrame for easy comparison.
The DataFrame will then be saved to a CSV file.

In [33]:
if all_results:
    results_df = pd.DataFrame(all_results)
    
    final_column_order = ['model']
    for metric_name_template in METRICS:
        if metric_name_template not in ["ItemCoverage", "GiniIndex", "ShannonEntropy", "AveragePopularity", "TailPercentage"]:
            col_name = f"{metric_name_template}@{TOP_K[0]}"
        else: 
            col_name = metric_name_template 
        
        if col_name in results_df.columns: 
            final_column_order.append(col_name)

    for col in results_df.columns:
        if col not in final_column_order:
            final_column_order.append(col)
            
    results_df = results_df.reindex(columns=final_column_order) 

    print("\n--- Aggregated Evaluation Results ---")
    print(results_df.to_string())

    results_csv_path = os.path.join(RESULTS_PATH, f"recbole_evaluation_summary_{current_timestamp}.csv")
    try:
        results_df.to_csv(results_csv_path, index=False)
        print(f"\nAggregated results saved to: {results_csv_path}")
    except Exception as e:
        print(f"Error saving results to CSV: {e}")
else:
    print("No results were collected from the experiments.")


--- Aggregated Evaluation Results ---
      model  Recall@10  Precision@10  NDCG@10  MRR@10  ItemCoverage  GiniIndex  ShannonEntropy  AveragePopularity  TailPercentage
0    Random     0.0061        0.0067   0.0070  0.0156        0.9982     0.2374          0.0044            43.2182          0.1064
1       Pop     0.1147        0.0920   0.1262  0.2274        0.0392     0.9876          0.0499           346.8551          0.0000
2       BPR     0.2453        0.1965   0.2849  0.4655        0.3720     0.8851          0.0088           216.3971          0.0000
3  LightGCN     0.2465        0.1965   0.2868  0.4661        0.2900     0.9122          0.0108           228.5819          0.0000
4   ItemKNN     0.2454        0.1951   0.2816  0.4614        0.2490     0.9129          0.0126           215.4427          0.0000

Aggregated results saved to: /mnt/c/Users/tduricic/Development/workspace/conversational-reco/recbole_results/recbole_evaluation_summary_20250520_230455.csv
