<i>Copyright (c) Microsoft Corporation. All rights reserved.</i>

<i>Licensed under the MIT License.</i>

# SAR Single Node on MovieLens (Python, CPU)

In this example, we will walk through each step of the Simple Algorithm for Recommendation (SAR) algorithm using a Python single-node implementation.

SAR is a fast, scalable, adaptive algorithm for personalized recommendations based on user transaction history. It is powered by understanding the similarity between items, and recommending similar items to those a user has an existing affinity for.

## 1 SAR algorithm

The following figure presents a high-level architecture of SAR. 

At a very high level, two intermediate matrices are created and used to generate a set of recommendation scores:

- An item similarity matrix $S$ estimates item-item relationships.
- An affinity matrix $A$ estimates user-item relationships.

Recommendation scores are then created by computing the matrix multiplication $A\times S$.

Optional steps (e.g. "time decay" and "remove seen items") are described in the details below.

<img src="https://recodatasets.z20.web.core.windows.net/images/sar_schema.svg?sanitize=true">

### 1.1 Compute item co-occurrence and item similarity

SAR defines similarity based on item-to-item co-occurrence data. Co-occurrence is defined as the number of times two items appear together for a given user. We can represent the co-occurrence of all items as a $m\times m$ matrix $C$, where $c_{i,j}$ is the number of times item $i$ occurred with item $j$, and $m$ is the total number of items.

The co-occurence matric $C$ has the following properties:

- It is symmetric, so $c_{i,j} = c_{j,i}$
- It is nonnegative: $c_{i,j} \geq 0$
- The occurrences are at least as large as the co-occurrences. I.e., the largest element for each row (and column) is on the main diagonal: $\forall(i,j) C_{i,i},C_{j,j} \geq C_{i,j}$.

Once we have a co-occurrence matrix, an item similarity matrix $S$ can be obtained by rescaling the co-occurrences according to a given metric. Options for the metric include `Jaccard`, `lift`, and `counts` (meaning no rescaling).


If $c_{ii}$ and $c_{jj}$ are the $i$th and $j$th diagonal elements of $C$, the rescaling options are:

- `Jaccard`: $s_{ij}=\frac{c_{ij}}{(c_{ii}+c_{jj}-c_{ij})}$
- `lift`: $s_{ij}=\frac{c_{ij}}{(c_{ii} \times c_{jj})}$
- `counts`: $s_{ij}=c_{ij}$

In general, using `counts` as a similarity metric favours predictability, meaning that the most popular items will be recommended most of the time. `lift` by contrast favours discoverability/serendipity: an item that is less popular overall but highly favoured by a small subset of users is more likely to be recommended. `Jaccard` is a compromise between the two.


### 1.2 Compute user affinity scores

The affinity matrix in SAR captures the strength of the relationship between each individual user and the items that user has already interacted with. SAR incorporates two factors that can impact users' affinities: 

- It can consider information about the **type** of user-item interaction through differential weighting of different events (e.g. it may weigh events in which a user rated a particular item more heavily than events in which a user viewed the item).
- It can consider information about **when** a user-item event occurred (e.g. it may discount the value of events that take place in the distant past.

Formalizing these factors produces us an expression for user-item affinity:

$$a_{ij}=\sum_k w_k \left(\frac{1}{2}\right)^{\frac{t_0-t_k}{T}} $$

where the affinity $a_{ij}$ for user $i$ and item $j$ is the weighted sum of all $k$ events involving user $i$ and item $j$. $w_k$ represents the weight of a particular event, and the power of 2 term reflects the temporally-discounted event. The $(\frac{1}{2})^n$ scaling factor causes the parameter $T$ to serve as a half-life: events $T$ units before $t_0$ will be given half the weight as those taking place at $t_0$.

Repeating this computation for all $n$ users and $m$ items results in an $n\times m$ matrix $A$. Simplifications of the above expression can be obtained by setting all the weights equal to 1 (effectively ignoring event types), or by setting the half-life parameter $T$ to infinity (ignoring transaction times).

### 1.3 Remove seen item

Optionally we remove items which have already been seen in the training set, i.e. don't recommend items which have been previously bought by the user again.

### 1.4 Top-k item calculation

The personalized recommendations for a set of users can then be obtained by multiplying the affinity matrix ($A$) by the similarity matrix ($S$). The result is a recommendation score matrix, where each row corresponds to a user, each column corresponds to an item, and each entry corresponds to a user / item pair. Higher scores correspond to more strongly recommended items.

It is worth noting that the complexity of recommending operation depends on the data size. SAR algorithm itself has $O(n^3)$ complexity. Therefore the single-node implementation is not supposed to handle large dataset in a scalable manner. Whenever one uses the algorithm, it is recommended to run with sufficiently large memory. 

## 2 SAR single-node implementation

The SAR implementation illustrated in this notebook was developed in Python, primarily with Python packages like `numpy`, `pandas`, and `scipy` which are commonly used in most of the data analytics / machine learning tasks. Details of the implementation can be found in [Recommenders/recommenders/models/sar/sar_singlenode.py](../../recommenders/models/sar/sar_singlenode.py).

## 3 SAR single-node based movie recommender

In [10]:
# set the environment path to find Recommenders
import sys

import itertools
import logging
import os

import numpy as np
import pandas as pd
import papermill as pm

from recommenders.datasets import movielens
from recommenders.datasets.python_splitters import python_stratified_split
from recommenders.evaluation.python_evaluation import map_at_k, ndcg_at_k, precision_at_k, recall_at_k, rmse, mae, logloss, rsquared
from recommenders.models.sar import SAR

print("System version: {}".format(sys.version))
print("Pandas version: {}".format(pd.__version__))

System version: 3.7.16 (default, Jan 17 2023, 22:20:44) 
[GCC 11.2.0]
Pandas version: 1.3.5


In [2]:
# top k items to recommend
TOP_K = 10

# Select MovieLens data size: 100k, 1m, 10m, or 20m
MOVIELENS_DATA_SIZE = '100k'

### 3.1 Load Data

SAR is intended to be used on interactions with the following schema:
`<User ID>, <Item ID>, <Time>`. 

Each row represents a single interaction between a user and an item. These interactions might be different types of events on an e-commerce website, such as a user clicking to view an item, adding it to a shopping basket, following a recommendation link, and so on. 

The MovieLens dataset is well formatted interactions of Users providing Ratings to Movies (movie ratings are used as the event weight) - we will use it for the rest of the example.

In [3]:
data = movielens.load_pandas_df(
    size=MOVIELENS_DATA_SIZE,
    header=['UserId', 'MovieId', 'Rating', 'Timestamp'],
    title_col='Title'
)


# Convert the float precision to 32-bit in order to reduce memory consumption 
data.loc[:, 'Rating'] = data['Rating'].astype(np.float32)

data.head()

100%|██████████| 4.81k/4.81k [00:00<00:00, 13.7kKB/s]


Unnamed: 0,UserId,MovieId,Rating,Timestamp,Title
0,196,242,3.0,881250949,Kolya (1996)
1,63,242,3.0,875747190,Kolya (1996)
2,226,242,5.0,883888671,Kolya (1996)
3,154,242,3.0,879138235,Kolya (1996)
4,306,242,5.0,876503793,Kolya (1996)


### 3.2 Split the data using the python random splitter provided in utilities:

We split the full dataset into a `train` and `test` dataset to evaluate performance of the algorithm against a held-out set not seen during training. Because SAR generates recommendations based on user preferences, all users that are in the test set must also exist in the training set. For this case, we can use the provided `python_stratified_split` function which holds out a percentage (in this case 25%) of items from each user, but ensures all users are in both `train` and `test` datasets. Other options are available in the `dataset.python_splitters` module which provide more control over how the split occurs.


In [4]:
header = {
    "col_user": "UserId",
    "col_item": "MovieId",
    "col_rating": "Rating",
    "col_timestamp": "Timestamp",
    "col_prediction": "Prediction",
}

In [5]:
train, test = python_stratified_split(data, ratio=0.75, col_user=header["col_user"], col_item=header["col_item"], seed=42)

In this case, for the illustration purpose, the following parameter values are used:

|Parameter|Value|Description|
|---------|---------|-------------|
|`similarity_type`|`jaccard`|Method used to calculate item similarity.|
|`time_decay_coefficient`|30|Period in days (term of $T$ shown in the formula of Section 1.2)|
|`time_now`|`None`|Time decay reference.|
|`timedecay_formula`|`True`|Whether time decay formula is used.|

In [24]:
## Setting up Grid Search CV
from sklearn.model_selection import GridSearchCV
import itertools

similarity_types = ["jaccard", "cosine", "lift", "cooccurrence"]
time_decay_coefficient = [10, 20, 30]
timedecay_formula = [True, False]
remove_seen = [True, False]
top_k = [10]

params = [similarity_types, time_decay_coefficient, timedecay_formula, remove_seen, top_k]
params_combinations = [p for p in itertools.product(*params)]
    

In [32]:
# set log level to INFO
logging.basicConfig(level=logging.DEBUG, 
                    format='%(asctime)s %(levelname)-8s %(message)s')


best_map = 100000
best_ndcg = 100000
best_precision = 100000
best_recall = 100000
best_rmse = 100000
best_mae = 100000

selected_map = 0
selected_ndcg = 0
selected_precision = 0
selected_recall = 0
selected_rmse = 0
selected_mae = 0
selected_rsquared = 0
best_param = []
for param in params_combinations:
    print(param)
    model = SAR(
        similarity_type=param[0], 
        time_decay_coefficient=param[1], 
        time_now=None, 
        timedecay_formula=param[2], 
        **header
    )
    model.fit(train)
    top_k = model.recommend_k_items(test, top_k=param[4], remove_seen=param[3])
    # all ranking metrics have the same arguments
    args = [test, top_k]
    kwargs = dict(col_user='UserId', 
                  col_item='MovieId', 
                  col_rating='Rating', 
                  col_prediction='Prediction', 
                  relevancy_method='top_k', 
                  k=param[4])

    eval_map = map_at_k(*args, **kwargs)
    eval_ndcg = ndcg_at_k(*args, **kwargs)
    eval_precision = precision_at_k(*args, **kwargs)
    eval_recall = recall_at_k(*args, **kwargs)
    eval_rmse = rmse(test, top_k, col_user='UserId', col_item='MovieId', col_rating='Rating', col_prediction='Prediction')
    eval_mae = mae(test, top_k, col_user='UserId', col_item='MovieId', col_rating='Rating', col_prediction='Prediction')
    eval_rsquared = rsquared(test, top_k, col_user='UserId', col_item='MovieId', col_rating='Rating', col_prediction='Prediction')
    if eval_mae < best_mae: 
        best_mae = eval_mae
        
        selected_map = eval_map
        selected_ndcg = eval_ndcg
        selected_precision = eval_precision
        selected_recall = eval_recall
        selected_rmse = eval_rmse
        selected_mae = eval_mae
        selected_rsquared = eval_rsquared
        
        
        best_param = param
# clf = GridSearchCV(model, param_grid)

2023-03-25 20:28:52,387 INFO     Collecting user affinity matrix
2023-03-25 20:28:52,392 INFO     Calculating time-decayed affinities
2023-03-25 20:28:52,457 INFO     Creating index columns


('jaccard', 10, True, True, 10)


2023-03-25 20:28:52,648 INFO     Building user affinity sparse matrix
2023-03-25 20:28:52,662 INFO     Calculating item co-occurrence
2023-03-25 20:28:52,973 INFO     Calculating item similarity
2023-03-25 20:28:52,974 INFO     Using jaccard based similarity
2023-03-25 20:28:53,046 INFO     Done training
2023-03-25 20:28:53,049 INFO     Calculating recommendation scores
2023-03-25 20:28:53,171 INFO     Removing seen items
2023-03-25 20:28:53,791 INFO     Collecting user affinity matrix
2023-03-25 20:28:53,796 INFO     Calculating time-decayed affinities
2023-03-25 20:28:53,851 INFO     Creating index columns


('jaccard', 10, True, False, 10)


2023-03-25 20:28:54,012 INFO     Building user affinity sparse matrix
2023-03-25 20:28:54,025 INFO     Calculating item co-occurrence
2023-03-25 20:28:54,322 INFO     Calculating item similarity
2023-03-25 20:28:54,324 INFO     Using jaccard based similarity
2023-03-25 20:28:54,383 INFO     Done training
2023-03-25 20:28:54,387 INFO     Calculating recommendation scores
2023-03-25 20:28:55,064 INFO     Collecting user affinity matrix
2023-03-25 20:28:55,068 INFO     Creating index columns
2023-03-25 20:28:55,222 INFO     Building user affinity sparse matrix
2023-03-25 20:28:55,232 INFO     Calculating item co-occurrence


('jaccard', 10, False, True, 10)


2023-03-25 20:28:55,545 INFO     Calculating item similarity
2023-03-25 20:28:55,547 INFO     Using jaccard based similarity
2023-03-25 20:28:55,602 INFO     Done training
2023-03-25 20:28:55,605 INFO     Calculating recommendation scores
2023-03-25 20:28:55,665 INFO     Removing seen items
2023-03-25 20:28:56,255 INFO     Collecting user affinity matrix
2023-03-25 20:28:56,259 INFO     Creating index columns
2023-03-25 20:28:56,414 INFO     Building user affinity sparse matrix
2023-03-25 20:28:56,424 INFO     Calculating item co-occurrence


('jaccard', 10, False, False, 10)


2023-03-25 20:28:56,690 INFO     Calculating item similarity
2023-03-25 20:28:56,691 INFO     Using jaccard based similarity
2023-03-25 20:28:56,733 INFO     Done training
2023-03-25 20:28:56,736 INFO     Calculating recommendation scores
2023-03-25 20:28:57,301 INFO     Collecting user affinity matrix
2023-03-25 20:28:57,306 INFO     Calculating time-decayed affinities
2023-03-25 20:28:57,345 INFO     Creating index columns


('jaccard', 20, True, True, 10)


2023-03-25 20:28:57,499 INFO     Building user affinity sparse matrix
2023-03-25 20:28:57,507 INFO     Calculating item co-occurrence
2023-03-25 20:28:57,763 INFO     Calculating item similarity
2023-03-25 20:28:57,765 INFO     Using jaccard based similarity
2023-03-25 20:28:57,816 INFO     Done training
2023-03-25 20:28:57,819 INFO     Calculating recommendation scores
2023-03-25 20:28:57,918 INFO     Removing seen items
2023-03-25 20:28:58,404 INFO     Collecting user affinity matrix
2023-03-25 20:28:58,408 INFO     Calculating time-decayed affinities
2023-03-25 20:28:58,451 INFO     Creating index columns


('jaccard', 20, True, False, 10)


2023-03-25 20:28:58,607 INFO     Building user affinity sparse matrix
2023-03-25 20:28:58,617 INFO     Calculating item co-occurrence
2023-03-25 20:28:58,887 INFO     Calculating item similarity
2023-03-25 20:28:58,888 INFO     Using jaccard based similarity
2023-03-25 20:28:58,942 INFO     Done training
2023-03-25 20:28:58,945 INFO     Calculating recommendation scores
2023-03-25 20:28:59,553 INFO     Collecting user affinity matrix
2023-03-25 20:28:59,556 INFO     Creating index columns
2023-03-25 20:28:59,692 INFO     Building user affinity sparse matrix
2023-03-25 20:28:59,704 INFO     Calculating item co-occurrence


('jaccard', 20, False, True, 10)


2023-03-25 20:28:59,976 INFO     Calculating item similarity
2023-03-25 20:28:59,977 INFO     Using jaccard based similarity
2023-03-25 20:29:00,017 INFO     Done training
2023-03-25 20:29:00,021 INFO     Calculating recommendation scores
2023-03-25 20:29:00,086 INFO     Removing seen items
2023-03-25 20:29:00,609 INFO     Collecting user affinity matrix
2023-03-25 20:29:00,612 INFO     Creating index columns
2023-03-25 20:29:00,746 INFO     Building user affinity sparse matrix
2023-03-25 20:29:00,756 INFO     Calculating item co-occurrence


('jaccard', 20, False, False, 10)


2023-03-25 20:29:01,021 INFO     Calculating item similarity
2023-03-25 20:29:01,022 INFO     Using jaccard based similarity
2023-03-25 20:29:01,070 INFO     Done training
2023-03-25 20:29:01,073 INFO     Calculating recommendation scores
2023-03-25 20:29:01,664 INFO     Collecting user affinity matrix
2023-03-25 20:29:01,668 INFO     Calculating time-decayed affinities
2023-03-25 20:29:01,713 INFO     Creating index columns


('jaccard', 30, True, True, 10)


2023-03-25 20:29:01,862 INFO     Building user affinity sparse matrix
2023-03-25 20:29:01,872 INFO     Calculating item co-occurrence
2023-03-25 20:29:02,135 INFO     Calculating item similarity
2023-03-25 20:29:02,136 INFO     Using jaccard based similarity
2023-03-25 20:29:02,188 INFO     Done training
2023-03-25 20:29:02,192 INFO     Calculating recommendation scores
2023-03-25 20:29:02,299 INFO     Removing seen items
2023-03-25 20:29:02,886 INFO     Collecting user affinity matrix
2023-03-25 20:29:02,890 INFO     Calculating time-decayed affinities
2023-03-25 20:29:02,933 INFO     Creating index columns
2023-03-25 20:29:03,053 INFO     Building user affinity sparse matrix
2023-03-25 20:29:03,062 INFO     Calculating item co-occurrence


('jaccard', 30, True, False, 10)


2023-03-25 20:29:03,322 INFO     Calculating item similarity
2023-03-25 20:29:03,323 INFO     Using jaccard based similarity
2023-03-25 20:29:03,369 INFO     Done training
2023-03-25 20:29:03,372 INFO     Calculating recommendation scores
2023-03-25 20:29:03,988 INFO     Collecting user affinity matrix
2023-03-25 20:29:03,992 INFO     Creating index columns
2023-03-25 20:29:04,144 INFO     Building user affinity sparse matrix
2023-03-25 20:29:04,153 INFO     Calculating item co-occurrence


('jaccard', 30, False, True, 10)


2023-03-25 20:29:04,410 INFO     Calculating item similarity
2023-03-25 20:29:04,412 INFO     Using jaccard based similarity
2023-03-25 20:29:04,459 INFO     Done training
2023-03-25 20:29:04,462 INFO     Calculating recommendation scores
2023-03-25 20:29:04,518 INFO     Removing seen items
2023-03-25 20:29:05,068 INFO     Collecting user affinity matrix
2023-03-25 20:29:05,071 INFO     Creating index columns
2023-03-25 20:29:05,211 INFO     Building user affinity sparse matrix
2023-03-25 20:29:05,221 INFO     Calculating item co-occurrence


('jaccard', 30, False, False, 10)


2023-03-25 20:29:05,472 INFO     Calculating item similarity
2023-03-25 20:29:05,473 INFO     Using jaccard based similarity
2023-03-25 20:29:05,516 INFO     Done training
2023-03-25 20:29:05,519 INFO     Calculating recommendation scores
2023-03-25 20:29:06,112 INFO     Collecting user affinity matrix
2023-03-25 20:29:06,117 INFO     Calculating time-decayed affinities
2023-03-25 20:29:06,160 INFO     Creating index columns
2023-03-25 20:29:06,284 INFO     Building user affinity sparse matrix


('cosine', 10, True, True, 10)


2023-03-25 20:29:06,312 INFO     Calculating item co-occurrence
2023-03-25 20:29:06,638 INFO     Calculating item similarity
2023-03-25 20:29:06,640 INFO     Using cosine similarity
2023-03-25 20:29:06,692 INFO     Done training
2023-03-25 20:29:06,695 INFO     Calculating recommendation scores
2023-03-25 20:29:06,801 INFO     Removing seen items
2023-03-25 20:29:07,359 INFO     Collecting user affinity matrix
2023-03-25 20:29:07,363 INFO     Calculating time-decayed affinities
2023-03-25 20:29:07,397 INFO     Creating index columns
2023-03-25 20:29:07,519 INFO     Building user affinity sparse matrix
2023-03-25 20:29:07,527 INFO     Calculating item co-occurrence


('cosine', 10, True, False, 10)


2023-03-25 20:29:07,771 INFO     Calculating item similarity
2023-03-25 20:29:07,772 INFO     Using cosine similarity
2023-03-25 20:29:07,806 INFO     Done training
2023-03-25 20:29:07,809 INFO     Calculating recommendation scores
2023-03-25 20:29:08,418 INFO     Collecting user affinity matrix
2023-03-25 20:29:08,422 INFO     Creating index columns
2023-03-25 20:29:08,547 INFO     Building user affinity sparse matrix
2023-03-25 20:29:08,556 INFO     Calculating item co-occurrence


('cosine', 10, False, True, 10)


2023-03-25 20:29:08,807 INFO     Calculating item similarity
2023-03-25 20:29:08,809 INFO     Using cosine similarity
2023-03-25 20:29:08,836 INFO     Done training
2023-03-25 20:29:08,840 INFO     Calculating recommendation scores
2023-03-25 20:29:08,895 INFO     Removing seen items
2023-03-25 20:29:09,414 INFO     Collecting user affinity matrix
2023-03-25 20:29:09,417 INFO     Creating index columns
2023-03-25 20:29:09,551 INFO     Building user affinity sparse matrix
2023-03-25 20:29:09,560 INFO     Calculating item co-occurrence


('cosine', 10, False, False, 10)


2023-03-25 20:29:09,812 INFO     Calculating item similarity
2023-03-25 20:29:09,813 INFO     Using cosine similarity
2023-03-25 20:29:09,842 INFO     Done training
2023-03-25 20:29:09,846 INFO     Calculating recommendation scores
2023-03-25 20:29:10,447 INFO     Collecting user affinity matrix
2023-03-25 20:29:10,451 INFO     Calculating time-decayed affinities
2023-03-25 20:29:10,486 INFO     Creating index columns
2023-03-25 20:29:10,634 INFO     Building user affinity sparse matrix


('cosine', 20, True, True, 10)


2023-03-25 20:29:10,644 INFO     Calculating item co-occurrence
2023-03-25 20:29:10,921 INFO     Calculating item similarity
2023-03-25 20:29:10,923 INFO     Using cosine similarity
2023-03-25 20:29:10,961 INFO     Done training
2023-03-25 20:29:10,965 INFO     Calculating recommendation scores
2023-03-25 20:29:11,082 INFO     Removing seen items
2023-03-25 20:29:11,650 INFO     Collecting user affinity matrix
2023-03-25 20:29:11,655 INFO     Calculating time-decayed affinities
2023-03-25 20:29:11,698 INFO     Creating index columns


('cosine', 20, True, False, 10)


2023-03-25 20:29:11,856 INFO     Building user affinity sparse matrix
2023-03-25 20:29:11,867 INFO     Calculating item co-occurrence
2023-03-25 20:29:12,177 INFO     Calculating item similarity
2023-03-25 20:29:12,178 INFO     Using cosine similarity
2023-03-25 20:29:12,215 INFO     Done training
2023-03-25 20:29:12,217 INFO     Calculating recommendation scores
2023-03-25 20:29:12,879 INFO     Collecting user affinity matrix
2023-03-25 20:29:12,883 INFO     Creating index columns
2023-03-25 20:29:13,041 INFO     Building user affinity sparse matrix
2023-03-25 20:29:13,051 INFO     Calculating item co-occurrence


('cosine', 20, False, True, 10)


2023-03-25 20:29:13,309 INFO     Calculating item similarity
2023-03-25 20:29:13,311 INFO     Using cosine similarity
2023-03-25 20:29:13,332 INFO     Done training
2023-03-25 20:29:13,335 INFO     Calculating recommendation scores
2023-03-25 20:29:13,380 INFO     Removing seen items
2023-03-25 20:29:13,901 INFO     Collecting user affinity matrix
2023-03-25 20:29:13,904 INFO     Creating index columns
2023-03-25 20:29:14,036 INFO     Building user affinity sparse matrix
2023-03-25 20:29:14,045 INFO     Calculating item co-occurrence


('cosine', 20, False, False, 10)


2023-03-25 20:29:14,309 INFO     Calculating item similarity
2023-03-25 20:29:14,311 INFO     Using cosine similarity
2023-03-25 20:29:14,340 INFO     Done training
2023-03-25 20:29:14,343 INFO     Calculating recommendation scores
2023-03-25 20:29:14,871 INFO     Collecting user affinity matrix
2023-03-25 20:29:14,874 INFO     Calculating time-decayed affinities
2023-03-25 20:29:14,909 INFO     Creating index columns
2023-03-25 20:29:15,047 INFO     Building user affinity sparse matrix
2023-03-25 20:29:15,056 INFO     Calculating item co-occurrence


('cosine', 30, True, True, 10)


2023-03-25 20:29:15,293 INFO     Calculating item similarity
2023-03-25 20:29:15,294 INFO     Using cosine similarity
2023-03-25 20:29:15,327 INFO     Done training
2023-03-25 20:29:15,330 INFO     Calculating recommendation scores
2023-03-25 20:29:15,421 INFO     Removing seen items
2023-03-25 20:29:15,972 INFO     Collecting user affinity matrix
2023-03-25 20:29:15,976 INFO     Calculating time-decayed affinities
2023-03-25 20:29:16,012 INFO     Creating index columns
2023-03-25 20:29:16,151 INFO     Building user affinity sparse matrix
2023-03-25 20:29:16,159 INFO     Calculating item co-occurrence


('cosine', 30, True, False, 10)


2023-03-25 20:29:16,412 INFO     Calculating item similarity
2023-03-25 20:29:16,413 INFO     Using cosine similarity
2023-03-25 20:29:16,451 INFO     Done training
2023-03-25 20:29:16,454 INFO     Calculating recommendation scores
2023-03-25 20:29:17,042 INFO     Collecting user affinity matrix
2023-03-25 20:29:17,045 INFO     Creating index columns
2023-03-25 20:29:17,157 INFO     Building user affinity sparse matrix
2023-03-25 20:29:17,166 INFO     Calculating item co-occurrence


('cosine', 30, False, True, 10)


2023-03-25 20:29:17,409 INFO     Calculating item similarity
2023-03-25 20:29:17,410 INFO     Using cosine similarity
2023-03-25 20:29:17,432 INFO     Done training
2023-03-25 20:29:17,434 INFO     Calculating recommendation scores
2023-03-25 20:29:17,479 INFO     Removing seen items
2023-03-25 20:29:17,942 INFO     Collecting user affinity matrix
2023-03-25 20:29:17,946 INFO     Creating index columns
2023-03-25 20:29:18,046 INFO     Building user affinity sparse matrix
2023-03-25 20:29:18,055 INFO     Calculating item co-occurrence


('cosine', 30, False, False, 10)


2023-03-25 20:29:18,301 INFO     Calculating item similarity
2023-03-25 20:29:18,303 INFO     Using cosine similarity
2023-03-25 20:29:18,326 INFO     Done training
2023-03-25 20:29:18,329 INFO     Calculating recommendation scores
2023-03-25 20:29:18,837 INFO     Collecting user affinity matrix
2023-03-25 20:29:18,840 INFO     Calculating time-decayed affinities
2023-03-25 20:29:18,870 INFO     Creating index columns
2023-03-25 20:29:18,971 INFO     Building user affinity sparse matrix
2023-03-25 20:29:18,978 INFO     Calculating item co-occurrence


('lift', 10, True, True, 10)


2023-03-25 20:29:19,215 INFO     Calculating item similarity
2023-03-25 20:29:19,216 INFO     Using lift based similarity
2023-03-25 20:29:19,241 INFO     Done training
2023-03-25 20:29:19,243 INFO     Calculating recommendation scores
2023-03-25 20:29:19,333 INFO     Removing seen items
2023-03-25 20:29:19,744 INFO     Collecting user affinity matrix
2023-03-25 20:29:19,747 INFO     Calculating time-decayed affinities
2023-03-25 20:29:19,776 INFO     Creating index columns
2023-03-25 20:29:19,887 INFO     Building user affinity sparse matrix
2023-03-25 20:29:19,895 INFO     Calculating item co-occurrence


('lift', 10, True, False, 10)


2023-03-25 20:29:20,134 INFO     Calculating item similarity
2023-03-25 20:29:20,135 INFO     Using lift based similarity
2023-03-25 20:29:20,163 INFO     Done training
2023-03-25 20:29:20,166 INFO     Calculating recommendation scores
2023-03-25 20:29:20,681 INFO     Collecting user affinity matrix
2023-03-25 20:29:20,684 INFO     Creating index columns
2023-03-25 20:29:20,788 INFO     Building user affinity sparse matrix
2023-03-25 20:29:20,795 INFO     Calculating item co-occurrence


('lift', 10, False, True, 10)


2023-03-25 20:29:21,009 INFO     Calculating item similarity
2023-03-25 20:29:21,010 INFO     Using lift based similarity
2023-03-25 20:29:21,031 INFO     Done training
2023-03-25 20:29:21,034 INFO     Calculating recommendation scores
2023-03-25 20:29:21,080 INFO     Removing seen items
2023-03-25 20:29:21,492 INFO     Collecting user affinity matrix
2023-03-25 20:29:21,495 INFO     Creating index columns
2023-03-25 20:29:21,595 INFO     Building user affinity sparse matrix
2023-03-25 20:29:21,602 INFO     Calculating item co-occurrence


('lift', 10, False, False, 10)


2023-03-25 20:29:21,816 INFO     Calculating item similarity
2023-03-25 20:29:21,817 INFO     Using lift based similarity
2023-03-25 20:29:21,838 INFO     Done training
2023-03-25 20:29:21,840 INFO     Calculating recommendation scores
2023-03-25 20:29:22,323 INFO     Collecting user affinity matrix
2023-03-25 20:29:22,327 INFO     Calculating time-decayed affinities
2023-03-25 20:29:22,367 INFO     Creating index columns
2023-03-25 20:29:22,488 INFO     Building user affinity sparse matrix
2023-03-25 20:29:22,496 INFO     Calculating item co-occurrence


('lift', 20, True, True, 10)


2023-03-25 20:29:22,744 INFO     Calculating item similarity
2023-03-25 20:29:22,745 INFO     Using lift based similarity
2023-03-25 20:29:22,779 INFO     Done training
2023-03-25 20:29:22,782 INFO     Calculating recommendation scores
2023-03-25 20:29:22,888 INFO     Removing seen items
2023-03-25 20:29:23,374 INFO     Collecting user affinity matrix
2023-03-25 20:29:23,378 INFO     Calculating time-decayed affinities
2023-03-25 20:29:23,417 INFO     Creating index columns


('lift', 20, True, False, 10)


2023-03-25 20:29:23,569 INFO     Building user affinity sparse matrix
2023-03-25 20:29:23,577 INFO     Calculating item co-occurrence
2023-03-25 20:29:23,795 INFO     Calculating item similarity
2023-03-25 20:29:23,796 INFO     Using lift based similarity
2023-03-25 20:29:23,823 INFO     Done training
2023-03-25 20:29:23,826 INFO     Calculating recommendation scores
2023-03-25 20:29:24,370 INFO     Collecting user affinity matrix
2023-03-25 20:29:24,372 INFO     Creating index columns
2023-03-25 20:29:24,487 INFO     Building user affinity sparse matrix
2023-03-25 20:29:24,495 INFO     Calculating item co-occurrence


('lift', 20, False, True, 10)


2023-03-25 20:29:24,724 INFO     Calculating item similarity
2023-03-25 20:29:24,725 INFO     Using lift based similarity
2023-03-25 20:29:24,745 INFO     Done training
2023-03-25 20:29:24,748 INFO     Calculating recommendation scores
2023-03-25 20:29:24,795 INFO     Removing seen items
2023-03-25 20:29:25,243 INFO     Collecting user affinity matrix
2023-03-25 20:29:25,247 INFO     Creating index columns
2023-03-25 20:29:25,366 INFO     Building user affinity sparse matrix
2023-03-25 20:29:25,374 INFO     Calculating item co-occurrence


('lift', 20, False, False, 10)


2023-03-25 20:29:25,593 INFO     Calculating item similarity
2023-03-25 20:29:25,594 INFO     Using lift based similarity
2023-03-25 20:29:25,615 INFO     Done training
2023-03-25 20:29:25,618 INFO     Calculating recommendation scores
2023-03-25 20:29:26,094 INFO     Collecting user affinity matrix
2023-03-25 20:29:26,096 INFO     Calculating time-decayed affinities
2023-03-25 20:29:26,125 INFO     Creating index columns
2023-03-25 20:29:26,233 INFO     Building user affinity sparse matrix
2023-03-25 20:29:26,244 INFO     Calculating item co-occurrence


('lift', 30, True, True, 10)


2023-03-25 20:29:26,481 INFO     Calculating item similarity
2023-03-25 20:29:26,483 INFO     Using lift based similarity
2023-03-25 20:29:26,515 INFO     Done training
2023-03-25 20:29:26,518 INFO     Calculating recommendation scores
2023-03-25 20:29:26,616 INFO     Removing seen items
2023-03-25 20:29:27,059 INFO     Collecting user affinity matrix
2023-03-25 20:29:27,062 INFO     Calculating time-decayed affinities
2023-03-25 20:29:27,091 INFO     Creating index columns
2023-03-25 20:29:27,207 INFO     Building user affinity sparse matrix
2023-03-25 20:29:27,214 INFO     Calculating item co-occurrence


('lift', 30, True, False, 10)


2023-03-25 20:29:27,438 INFO     Calculating item similarity
2023-03-25 20:29:27,439 INFO     Using lift based similarity
2023-03-25 20:29:27,467 INFO     Done training
2023-03-25 20:29:27,469 INFO     Calculating recommendation scores
2023-03-25 20:29:27,982 INFO     Collecting user affinity matrix
2023-03-25 20:29:27,984 INFO     Creating index columns
2023-03-25 20:29:28,097 INFO     Building user affinity sparse matrix
2023-03-25 20:29:28,104 INFO     Calculating item co-occurrence


('lift', 30, False, True, 10)


2023-03-25 20:29:28,313 INFO     Calculating item similarity
2023-03-25 20:29:28,315 INFO     Using lift based similarity
2023-03-25 20:29:28,338 INFO     Done training
2023-03-25 20:29:28,341 INFO     Calculating recommendation scores
2023-03-25 20:29:28,382 INFO     Removing seen items
2023-03-25 20:29:28,809 INFO     Collecting user affinity matrix
2023-03-25 20:29:28,813 INFO     Creating index columns
2023-03-25 20:29:28,918 INFO     Building user affinity sparse matrix
2023-03-25 20:29:28,926 INFO     Calculating item co-occurrence


('lift', 30, False, False, 10)


2023-03-25 20:29:29,164 INFO     Calculating item similarity
2023-03-25 20:29:29,165 INFO     Using lift based similarity
2023-03-25 20:29:29,188 INFO     Done training
2023-03-25 20:29:29,190 INFO     Calculating recommendation scores
2023-03-25 20:29:29,680 INFO     Collecting user affinity matrix
2023-03-25 20:29:29,684 INFO     Calculating time-decayed affinities
2023-03-25 20:29:29,714 INFO     Creating index columns
2023-03-25 20:29:29,839 INFO     Building user affinity sparse matrix
2023-03-25 20:29:29,846 INFO     Calculating item co-occurrence


('cooccurrence', 10, True, True, 10)


2023-03-25 20:29:30,070 INFO     Calculating item similarity
2023-03-25 20:29:30,071 INFO     Using co-occurrence based similarity
2023-03-25 20:29:30,072 INFO     Done training
2023-03-25 20:29:30,076 INFO     Calculating recommendation scores
2023-03-25 20:29:30,590 INFO     Removing seen items
2023-03-25 20:29:31,066 INFO     Collecting user affinity matrix
2023-03-25 20:29:31,069 INFO     Calculating time-decayed affinities
2023-03-25 20:29:31,101 INFO     Creating index columns
2023-03-25 20:29:31,222 INFO     Building user affinity sparse matrix
2023-03-25 20:29:31,231 INFO     Calculating item co-occurrence


('cooccurrence', 10, True, False, 10)


2023-03-25 20:29:31,464 INFO     Calculating item similarity
2023-03-25 20:29:31,465 INFO     Using co-occurrence based similarity
2023-03-25 20:29:31,466 INFO     Done training
2023-03-25 20:29:31,471 INFO     Calculating recommendation scores
2023-03-25 20:29:32,605 INFO     Collecting user affinity matrix
2023-03-25 20:29:32,609 INFO     Creating index columns
2023-03-25 20:29:32,737 INFO     Building user affinity sparse matrix
2023-03-25 20:29:32,745 INFO     Calculating item co-occurrence


('cooccurrence', 10, False, True, 10)


2023-03-25 20:29:32,992 INFO     Calculating item similarity
2023-03-25 20:29:32,993 INFO     Using co-occurrence based similarity
2023-03-25 20:29:32,994 INFO     Done training
2023-03-25 20:29:32,998 INFO     Calculating recommendation scores
2023-03-25 20:29:33,552 INFO     Removing seen items
2023-03-25 20:29:34,037 INFO     Collecting user affinity matrix
2023-03-25 20:29:34,041 INFO     Creating index columns
2023-03-25 20:29:34,159 INFO     Building user affinity sparse matrix
2023-03-25 20:29:34,167 INFO     Calculating item co-occurrence


('cooccurrence', 10, False, False, 10)


2023-03-25 20:29:34,388 INFO     Calculating item similarity
2023-03-25 20:29:34,389 INFO     Using co-occurrence based similarity
2023-03-25 20:29:34,390 INFO     Done training
2023-03-25 20:29:34,394 INFO     Calculating recommendation scores
2023-03-25 20:29:35,382 INFO     Collecting user affinity matrix
2023-03-25 20:29:35,385 INFO     Calculating time-decayed affinities
2023-03-25 20:29:35,414 INFO     Creating index columns
2023-03-25 20:29:35,537 INFO     Building user affinity sparse matrix
2023-03-25 20:29:35,544 INFO     Calculating item co-occurrence


('cooccurrence', 20, True, True, 10)


2023-03-25 20:29:35,762 INFO     Calculating item similarity
2023-03-25 20:29:35,764 INFO     Using co-occurrence based similarity
2023-03-25 20:29:35,765 INFO     Done training
2023-03-25 20:29:35,768 INFO     Calculating recommendation scores
2023-03-25 20:29:36,287 INFO     Removing seen items
2023-03-25 20:29:36,731 INFO     Collecting user affinity matrix
2023-03-25 20:29:36,734 INFO     Calculating time-decayed affinities
2023-03-25 20:29:36,774 INFO     Creating index columns
2023-03-25 20:29:36,877 INFO     Building user affinity sparse matrix
2023-03-25 20:29:36,885 INFO     Calculating item co-occurrence


('cooccurrence', 20, True, False, 10)


2023-03-25 20:29:37,101 INFO     Calculating item similarity
2023-03-25 20:29:37,103 INFO     Using co-occurrence based similarity
2023-03-25 20:29:37,104 INFO     Done training
2023-03-25 20:29:37,106 INFO     Calculating recommendation scores
2023-03-25 20:29:38,017 INFO     Collecting user affinity matrix
2023-03-25 20:29:38,020 INFO     Creating index columns
2023-03-25 20:29:38,127 INFO     Building user affinity sparse matrix
2023-03-25 20:29:38,135 INFO     Calculating item co-occurrence


('cooccurrence', 20, False, True, 10)


2023-03-25 20:29:38,359 INFO     Calculating item similarity
2023-03-25 20:29:38,361 INFO     Using co-occurrence based similarity
2023-03-25 20:29:38,362 INFO     Done training
2023-03-25 20:29:38,366 INFO     Calculating recommendation scores
2023-03-25 20:29:38,910 INFO     Removing seen items
2023-03-25 20:29:39,352 INFO     Collecting user affinity matrix
2023-03-25 20:29:39,355 INFO     Creating index columns
2023-03-25 20:29:39,479 INFO     Building user affinity sparse matrix
2023-03-25 20:29:39,488 INFO     Calculating item co-occurrence


('cooccurrence', 20, False, False, 10)


2023-03-25 20:29:39,710 INFO     Calculating item similarity
2023-03-25 20:29:39,712 INFO     Using co-occurrence based similarity
2023-03-25 20:29:39,713 INFO     Done training
2023-03-25 20:29:39,717 INFO     Calculating recommendation scores
2023-03-25 20:29:40,862 INFO     Collecting user affinity matrix
2023-03-25 20:29:40,865 INFO     Calculating time-decayed affinities
2023-03-25 20:29:40,905 INFO     Creating index columns
2023-03-25 20:29:41,022 INFO     Building user affinity sparse matrix
2023-03-25 20:29:41,031 INFO     Calculating item co-occurrence


('cooccurrence', 30, True, True, 10)


2023-03-25 20:29:41,273 INFO     Calculating item similarity
2023-03-25 20:29:41,274 INFO     Using co-occurrence based similarity
2023-03-25 20:29:41,275 INFO     Done training
2023-03-25 20:29:41,279 INFO     Calculating recommendation scores
2023-03-25 20:29:41,798 INFO     Removing seen items
2023-03-25 20:29:42,239 INFO     Collecting user affinity matrix
2023-03-25 20:29:42,242 INFO     Calculating time-decayed affinities
2023-03-25 20:29:42,274 INFO     Creating index columns
2023-03-25 20:29:42,388 INFO     Building user affinity sparse matrix
2023-03-25 20:29:42,395 INFO     Calculating item co-occurrence


('cooccurrence', 30, True, False, 10)


2023-03-25 20:29:42,606 INFO     Calculating item similarity
2023-03-25 20:29:42,607 INFO     Using co-occurrence based similarity
2023-03-25 20:29:42,608 INFO     Done training
2023-03-25 20:29:42,612 INFO     Calculating recommendation scores
2023-03-25 20:29:43,551 INFO     Collecting user affinity matrix
2023-03-25 20:29:43,555 INFO     Creating index columns
2023-03-25 20:29:43,672 INFO     Building user affinity sparse matrix
2023-03-25 20:29:43,680 INFO     Calculating item co-occurrence


('cooccurrence', 30, False, True, 10)


2023-03-25 20:29:43,920 INFO     Calculating item similarity
2023-03-25 20:29:43,921 INFO     Using co-occurrence based similarity
2023-03-25 20:29:43,922 INFO     Done training
2023-03-25 20:29:43,925 INFO     Calculating recommendation scores
2023-03-25 20:29:44,466 INFO     Removing seen items
2023-03-25 20:29:44,936 INFO     Collecting user affinity matrix
2023-03-25 20:29:44,941 INFO     Creating index columns
2023-03-25 20:29:45,080 INFO     Building user affinity sparse matrix
2023-03-25 20:29:45,090 INFO     Calculating item co-occurrence


('cooccurrence', 30, False, False, 10)


2023-03-25 20:29:45,327 INFO     Calculating item similarity
2023-03-25 20:29:45,329 INFO     Using co-occurrence based similarity
2023-03-25 20:29:45,330 INFO     Done training
2023-03-25 20:29:45,334 INFO     Calculating recommendation scores


In [33]:
print(selected_precision)
print(selected_map)
print(selected_ndcg)
print(selected_precision)
print(selected_recall)
print(selected_rmse)
print(best_mae)
print(selected_rsquared)


print(param)
#model.fit(train)

0.001166489925768823
0.00014852465968896158
0.0012178224110199845
0.001166489925768823
0.0005812385334748113
2.1864836
1.6770335
-4.164875047621508
('cooccurrence', 30, False, False, 10)


In [None]:
#top_k = model.recommend_k_items(test, top_k=TOP_K, remove_seen=True)

The final output from the `recommend_k_items` method generates recommendation scores for each user-item pair, which are shown as follows.

In [None]:
# top_k_with_titles = (top_k.join(data[['MovieId', 'Title']].drop_duplicates().set_index('MovieId'), 
#                                 on='MovieId', 
#                                 how='inner').sort_values(by=['UserId', 'Prediction'], ascending=False))
# display(top_k_with_titles.head(10))

### 3.3 Evaluate the results

It should be known that the recommendation scores generated by multiplying the item similarity matrix $S$ and the user affinity matrix $A$ **DOES NOT** have the same scale with the original explicit ratings in the movielens dataset. That is to say, SAR algorithm is meant for the task of *recommending relevent items to users* rather than *predicting explicit ratings for user-item pairs*. 

To this end, ranking metrics like precision@k, recall@k, etc., are more applicable to evaluate SAR algorithm. The following illustrates how to evaluate SAR model by using the evaluation functions provided in the `recommenders`.

In [None]:
# all ranking metrics have the same arguments
# args = [test, top_k]
# kwargs = dict(col_user='UserId', 
#               col_item='MovieId', 
#               col_rating='Rating', 
#               col_prediction='Prediction', 
#               relevancy_method='top_k', 
#               k=TOP_K)

# eval_map = map_at_k(*args, **kwargs)
# eval_ndcg = ndcg_at_k(*args, **kwargs)
# eval_precision = precision_at_k(*args, **kwargs)
# eval_recall = recall_at_k(*args, **kwargs)

In [None]:
print(f"Model:",
      f"Top K:\t\t {TOP_K}",
      f"MAP:\t\t {eval_map:f}",
      f"NDCG:\t\t {eval_ndcg:f}",
      f"Precision@K:\t {eval_precision:f}",
      f"Recall@K:\t {eval_recall:f}", sep='\n')

## References
Note SAR is a combinational algorithm that implements different industry heuristics. The followings are references that may be helpful in understanding the SAR logic and implementation. 

1. Badrul Sarwar, *et al*, "Item-based collaborative filtering recommendation algorithms", WWW, 2001.
2. Scipy (sparse matrix), url: https://docs.scipy.org/doc/scipy/reference/sparse.html
3. Asela Gunawardana and Guy Shani, "A survey of accuracy evaluation metrics of recommendation tasks", The Journal of Machine Learning Research, vol. 10, pp 2935-2962, 2009.	