<i>Copyright (c) Microsoft Corporation. All rights reserved.</i>

<i>Licensed under the MIT License.</i>

# SAR Single Node on MovieLens (Python, CPU)

Smart Adaptive Recommender (SAR) is a fast and scalable algorithm for personalized recommendations based on user transaction history. It produces easily explainable and interpretable recommendations and handles "cold item" and "semi-cold user" scenarios. SAR is a kind of neighborhood based algorithm (as discussed in [Recommender Systems by Aggarwal](https://dl.acm.org/citation.cfm?id=2931100)) which is intended for ranking top items for each user. More details about SAR can be found in the [deep dive notebook](../02_model/sar_deep_dive.ipynb). 

SAR recommends items that are most ***similar*** to the ones that the user already has an existing ***affinity*** for. Two items are ***similar*** if the users that interacted with one item are also likely to have interacted with the other. A user has an ***affinity*** to an item if they have interacted with it in the past.

### Advantages of SAR:
- High accuracy for an easy to train and deploy algorithm
- Fast training, only requiring simple counting to construct matrices used at prediction time. 
- Fast scoring, only involving multiplication of the similarity matrix with an affinity vector

### Notes to use SAR properly:
- Since it does not use item or user features, it can be at a disadvantage against algorithms that do.
- It's memory-hungry, requiring the creation of an $mxm$ sparse square matrix (where $m$ is the number of items). This can also be a problem for many matrix factorization algorithms.
- SAR favors an implicit rating scenario and it does not predict ratings.

This notebook provides an example of how to utilize and evaluate SAR in Python on a CPU.

# 0 Global Settings and Imports

In [61]:
# set the environment path to find Recommenders
import sys
sys.path.append("../../")

import logging
import time

import numpy as np
import pandas as pd
import papermill as pm

from reco_utils.dataset import movielens
from reco_utils.dataset.python_splitters import python_random_split
from reco_utils.evaluation.python_evaluation import map_at_k, ndcg_at_k, precision_at_k, recall_at_k
from reco_utils.recommender.sar import SAR

print("System version: {}".format(sys.version))
print("Pandas version: {}".format(pd.__version__))

System version: 3.6.8 |Anaconda, Inc.| (default, Feb 21 2019, 18:30:04) [MSC v.1916 64 bit (AMD64)]
Pandas version: 0.24.2


# 1 Load Data

SAR is intended to be used on interactions with the following schema:
`<User ID>, <Item ID>,<Time>,[<Event Type>], [<Event Weight>]`. 

Each row represents a single interaction between a user and an item. These interactions might be different types of events on an e-commerce website, such as a user clicking to view an item, adding it to a shopping basket, following a recommendation link, and so on. Each event type can be assigned a different weight, for example, we might assign a “buy” event a weight of 10, while a “view” event might only have a weight of 1.

The MovieLens dataset is well formatted interactions of Users providing Ratings to Movies (movie ratings are used as the event weight) - we will use it for the rest of the example.

In [62]:
# top k items to recommend
TOP_K = 10

# Select MovieLens data size: 100k, 1m, 10m, or 20m
MOVIELENS_DATA_SIZE = '100k'

### 1.1 Download and use the MovieLens Dataset

In [63]:
data = movielens.load_pandas_df(
    size=MOVIELENS_DATA_SIZE
)

# Convert the float precision to 32-bit in order to reduce memory consumption 
data['rating'] = data['rating'].astype(np.float32)

data.head()

4.93MB [00:01, 3.38MB/s]                                                                                                                                                                                                                                                   


Unnamed: 0,userID,itemID,rating,timestamp
0,196,242,3.0,881250949
1,186,302,3.0,891717742
2,22,377,1.0,878887116
3,244,51,2.0,880606923
4,166,346,1.0,886397596


### 1.2 Split the data using the python random splitter provided in utilities:

We utilize the provided `python_random_split` function to split into `train` and `test` datasets randomly at a 80/20 ratio. This takes a random selection of 80% of the user-item ratings and uses them for training, the remaining 20% of ratings are used for testing. Other options are available in the `dataset.python_splitters` module which provide more control over how the split occurs.

In [64]:
train, test = python_random_split(data)

In [65]:
print("""
Train:
Total Ratings: {train_total}
Unique Users: {train_users}
Unique Items: {train_items}

Test:
Total Ratings: {test_total}
Unique Users: {test_users}
Unique Items: {test_items}
""".format(
    train_total=len(train),
    train_users=len(train['userID'].unique()),
    train_items=len(train['itemID'].unique()),
    test_total=len(test),
    test_users=len(test['userID'].unique()),
    test_items=len(test['itemID'].unique()),
))


Train:
Total Ratings: 75000
Unique Users: 943
Unique Items: 1642

Test:
Total Ratings: 25000
Unique Users: 942
Unique Items: 1453



# 2 Train the SAR Model

### 2.1 Instantiate the SAR algorithm and set the index

We will use the single node implementation of SAR and specify the column names to match our dataset (timestamp is an optional column that is used and can be removed if your dataset does not contain it).

Other options are specified to control the behavior of the algorithm as described in the [deep dive notebook](../02_model/sar_deep_dive.ipynb).

In [66]:
logging.basicConfig(level=logging.DEBUG, 
                    format='%(asctime)s %(levelname)-8s %(message)s')

model = SAR(
    col_user='userID',
    col_item='itemID',
    col_rating='rating',
    col_timestamp='timestamp',
    similarity_type="jaccard", 
    time_decay_coefficient=30, 
    timedecay_formula=True
)

### 2.2 Train the SAR model on our training data, and get the top-k recommendations for our testing data

SAR first computes an item-to-item ***co-occurence matrix***. Co-occurence represents the number of times two items appear together for any given user. Once we have the co-occurence matrix, we compute an ***item similarity matrix*** by rescaling the cooccurences by a given metric (Jaccard similarity in this example). 

We also compute an ***affinity matrix*** to capture the strength of the relationship between each user and each item. Affinity is driven by different types (like *rating* or *viewing* a movie), and by the time of the event. 

Recommendations are achieved by multiplying the affinity matrix $A$ and the similarity matrix $S$. The result is a ***recommendation score matrix*** $R$. We compute the ***top-k*** results for each user in the `recommend_k_items` function seen below.

A full walkthrough of the SAR algorithm can be found [here](../02_model/sar_deep_dive.ipynb).

In [67]:
start_time = time.time()

model.fit(train)

train_time = time.time() - start_time
print("Took {} seconds for training.".format(train_time))

2019-04-22 15:57:19,452 INFO     Collecting user affinity matrix
2019-04-22 15:57:19,457 INFO     Calculating time-decayed affinities
2019-04-22 15:57:19,502 INFO     Creating index columns
2019-04-22 15:57:19,590 INFO     Building user affinity sparse matrix
2019-04-22 15:57:19,599 INFO     Calculating item co-occurrence
2019-04-22 15:57:19,771 INFO     Calculating item similarity
2019-04-22 15:57:19,772 INFO     Calculating jaccard
2019-04-22 15:57:19,910 INFO     Done training


Took 0.46326375007629395 seconds for training.


In [68]:
start_time = time.time()

top_k = model.recommend_k_items(test, remove_seen=True)

test_time = time.time() - start_time
print("Took {} seconds for prediction.".format(test_time))

2019-04-22 15:57:19,921 INFO     Calculating recommendation scores
2019-04-22 15:57:20,020 INFO     Removing seen items


Took 0.15295743942260742 seconds for prediction.


In [69]:
display(top_k.head())

Unnamed: 0,userID,itemID,prediction
0,877,153,2.311135
1,877,234,2.311886
2,877,238,2.315627
3,877,168,2.39381
4,877,97,2.410769


### 5. Evaluate how well SAR performs 

We evaluate how well SAR performs for a few common ranking metrics provided in the `PythonRankingEvaluation` class in utilities. We will consider the Mean Average Precision (MAP), Normalized Discounted Cumalative Gain (NDCG), Precision, and Recall for the top-k items per user we computed with SAR. User, item and rating column names are specified in each evaluation method.

In [70]:
eval_map = map_at_k(test, top_k, col_user='userID', col_item='itemID', col_rating='rating', k=TOP_K)

In [71]:
eval_ndcg = ndcg_at_k(test, top_k, col_user='userID', col_item='itemID', col_rating='rating', k=TOP_K)

In [72]:
eval_precision = precision_at_k(test, top_k, col_user='userID', col_item='itemID', col_rating='rating', k=TOP_K)

In [73]:
eval_recall = recall_at_k(test, top_k, col_user='userID', col_item='itemID', col_rating='rating', k=TOP_K)

In [74]:
print("Model:\t" + model.model_str,
      "Top K:\t%d" % TOP_K,
      "MAP:\t%f" % eval_map,
      "NDCG:\t%f" % eval_ndcg,
      "Precision@K:\t%f" % eval_precision,
      "Recall@K:\t%f" % eval_recall, sep='\n')

Model:	sar_ref
Top K:	10
MAP:	0.104852
NDCG:	0.370487
Precision@K:	0.321125
Recall@K:	0.174170


In [75]:
# Now let's look at the results for a specific user
user_id = 877

ground_truth = test[test['userID']==user_id].sort_values(by='rating', ascending=False)[:TOP_K]
prediction = model.recommend_k_items(pd.DataFrame(dict(userID=[user_id])), remove_seen=True) 
pd.merge(ground_truth, prediction, on=['userID', 'itemID'], how='left')

2019-04-22 15:57:20,462 INFO     Calculating recommendation scores
2019-04-22 15:57:20,556 INFO     Removing seen items


Unnamed: 0,userID,itemID,rating,timestamp,prediction
0,877,744,5.0,882677280,
1,877,56,5.0,882678483,2.366669
2,877,70,5.0,882677012,
3,877,170,5.0,882677012,
4,877,333,4.0,882676259,
5,877,690,4.0,882676098,
6,877,584,4.0,882677507,
7,877,566,4.0,882678547,
8,877,52,4.0,882677507,
9,877,241,4.0,882678194,


Above, we see that one of the highest rated items from the test set was recovered by the model's top-k recommendations, however the others were not. Offline evaluations are difficult as they can only use what was seen previously in the test set and may not represent the user's actual preferences across the entire set of items. Adjustments to how the data is split, algorithm used and hyper-parameters can improve the results here. 

In [76]:
# Record results with papermill for tests - ignore this cell
pm.record("map", eval_map)
pm.record("ndcg", eval_ndcg)
pm.record("precision", eval_precision)
pm.record("recall", eval_recall)
pm.record("train_time", train_time)
pm.record("test_time", test_time)