<i>Copyright (c) Microsoft Corporation. All rights reserved.</i>

<i>Licensed under the MIT License.</i>

# SLi_Rec : Adaptive User Modeling with Long and Short-Term Preferences for Personailzed Recommendation
Unlike a general recommender such as Matrix Factorization or xDeepFM (in the repo) which doesn't consider the order of the user's activities, sequential recomemnder systems take the sequence of the user behaviors as context and the goal is to predict the items that the user will interact in a short time (in an extreme case, the item that the user will interact next).

This notebook aims to give you a quick example of how to train a sequential model based on a public Amazon dataset. Currently, we can support GRU4Rec \[2\], Caser \[3\] and SLi_Rec \[1\]. Without loss of generality, this notebook takes [SLi_Rec model](https://www.microsoft.com/en-us/research/uploads/prod/2019/07/IJCAI19-ready_v1.pdf) for example.
SLi_Rec \[1\] is a deep learning-based model aims at capturing both long and short-term user preferences for precise recommender systems. To summarize, SLi_Rec has the following key properties:

* It adopts the attentive "Asymmetric-SVD" paradigm for long-term modeling;
* It takes both time irregularity and semantic irregularity into consideration by modifying the gating logic in LSTM.
* It uses an attention mechanism to dynamic fuse the long-term component and short-term component.

In this notebook, we test SLi_Rec on a subset of the public dataset: [Amazon_reviews](http://snap.stanford.edu/data/amazon/productGraph/categoryFiles/reviews_Movies_and_TV_5.json.gz) and [Amazon_metadata](http://snap.stanford.edu/data/amazon/productGraph/categoryFiles/meta_Movies_and_TV.json.gz)

## 0. Global Settings and Imports

In [1]:
import sys
sys.path.append("../../")
import os
import logging
import papermill as pm
from tempfile import TemporaryDirectory

import tensorflow as tf

from reco_utils.common.constants import SEED
from reco_utils.recommender.deeprec.deeprec_utils import (
    prepare_hparams
)
from reco_utils.dataset.amazon_reviews import download_and_extract, data_preprocessing
from reco_utils.dataset.download_utils import maybe_download
from reco_utils.recommender.deeprec.models.sequential.sli_rec import SLI_RECModel
from reco_utils.recommender.deeprec.IO.sequential_iterator import SequentialIterator

print("System version: {}".format(sys.version))
print("Tensorflow version: {}".format(tf.__version__))

tmpdir = os.path.join("..", "..", "tests", "resources", "deeprec", "slirec")

System version: 3.6.8 |Anaconda, Inc.| (default, Feb 21 2019, 18:30:04) [MSC v.1916 64 bit (AMD64)]
Tensorflow version: 1.12.0


#### Parameters

In [2]:
EPOCHS = 10
BATCH_SIZE = 400
RANDOM_SEED = SEED  # Set None for non-deterministic result

##  1. Input data format
The input data contains 8 columns, i.e.,   `<label> <user_id> <item_id> <category_id> <timestamp> <history_item_ids> <history_cateory_ids> <hitory_timestamp>`  columns are seperated by `"\t"`.  item_id and category_id denote the target item and category, which means that for this instance, we want to guess whether user user_id will interact with item_id at timestamp. `<history_*>` columns record the user behavior list up to `<timestamp>`, elements are separated by commas.  `<label>` is a binary value with 1 for positive instances and 0 for negative instances.  One example for an instance is `1       A1QQ86H5M2LVW2  B0059XTU1S      Movies  1377561600      B002ZG97WE,B004IK30PA,B000BNX3AU,B0017ANB08,B005LAIHW2  Movies,Movies,Movies,Movies,Movies   1304294400,1304812800,1315785600,1316304000,1356998400` <br>
Only the SLi_Rec model is time-aware. For the other models, you can just pad some meaningless timestamp in the data files to fill up the format, the models will ignore these columns.

We use Softmax to the loss function. In training and evalution stage, we group 1 positive instance with num_ngs negative instances. Pair-wise ranking can be regarded as a special case of Softmax ranking, where num_ngs is set to 1. 

More specifically,  for training and evalation, you need to organize the data file such that each one positive instance is followd by num_ngs negative instances. Our program will take 1+num_ngs lines as a unit for Softmax calculation. num_ngs is a parameter you need to pass to the `prepare_hparams`, `fit` and `run_eval` function.  In evaluation, the model calculates metrics among the 1+num_ngs instances. For the `predict` function, since we only need to calcuate a socre for each individual instance, there is no need for num_ngs setting.  More details and examples will be provided in the following sections.

For training stage, if you don't want to prepare negative instances, you can just provide positive instances and set the parameter `need_sample=True, train_num_ngs=train_num_ngs` for function `prepare_hparams`, our model will dynamicly sample `train_num_ngs` instances as negative samples in each mini batch.

###  Amazon dataset
Now let's start with a public dataset containing product reviews and metadata from Amazon, which is widely used as a benchmark dataset in recommemdation systems field.

In [3]:
data_path = tmpdir

# for test
yaml_file = '../../reco_utils/recommender/deeprec/config/sli_rec.yaml'
train_file = os.path.join(data_path, r'train_data')
valid_file = os.path.join(data_path, r'valid_data')
test_file = os.path.join(data_path, r'test_data')
user_vocab = os.path.join(data_path, r'user_vocab.pkl')
item_vocab = os.path.join(data_path, r'item_vocab.pkl')
cate_vocab = os.path.join(data_path, r'category_vocab.pkl')
output_file = os.path.join(data_path, r'output.txt')

reviews_name = 'reviews_Movies_and_TV_5.json'
meta_name = 'meta_Movies_and_TV.json'
reviews_file = os.path.join(data_path, reviews_name)
meta_file = os.path.join(data_path, meta_name)
train_num_ngs = 4 # number of negative instances with a positive instance for training
valid_num_ngs = 4 # number of negative instances with a positive instance for validation
test_num_ngs = 9 # number of negative instances with a positive instance for testing
sample_rate = 0.01 # sample a small item set for training and testing here for example

input_files = [reviews_file, meta_file, train_file, valid_file, test_file, user_vocab, item_vocab, cate_vocab]

if not os.path.exists(train_file):
    download_and_extract(reviews_name, reviews_file)
    download_and_extract(meta_name, meta_file)
    data_preprocessing(*input_files, sample_rate=sample_rate, valid_num_ngs=valid_num_ngs, test_num_ngs=test_num_ngs)

100%|██████████████████████████████████████████████████████████████████████████████| 692k/692k [03:18<00:00, 3.48kKB/s]
100%|████████████████████████████████████████████████████████████████████████████| 97.5k/97.5k [00:32<00:00, 3.03kKB/s]


start reviews preprocessing...
start meta preprocessing...
start create instances...
creating item2cate dict
getting sampled data...
start data processing...
data generating...
vocab generating...
start valid negative sampling
start test negative sampling


#### 1.1 Prepare hyper-parameters
prepare_hparams() will create a full set of hyper-parameters for model training, such as learning rate, feature number, and dropout ratio. We can put those parameters in a yaml file (a complete list of parameters can be found under our config folder) , or pass parameters as the function's parameters (which will overwrite yaml settings).

Parameters hints: <br>
`need_sample` controls whether to perform dynamic negative sampling in mini-batch. 
`train_num_ngs` indicates how many negative instances followed by one positive instances.  <br>
Examples: <br>
`need_sample=True and train_num_ngs=4`:  There are only positive instances in your training file. Our model will dynamically sample 4 negative instances for each positive instances in mini-batch. Note that if need_sample is set to True, train_num_ngs should be greater than zero. <br>
`need_sample=False and train_num_ngs=4`: In your training file, each one positive line is followed by 4 negative lines. Note that if need_sample is set to False, you must provide a traiing file with negative instances, and train_num_ngs should match the number of negative number in your training file.

In [4]:
hparams = prepare_hparams(yaml_file, 
                          embed_l2=0., 
                          layer_l2=0., 
                          learning_rate=0.001, 
                          epochs=EPOCHS,
                          batch_size=BATCH_SIZE,
                          show_step=20,
                          MODEL_DIR=os.path.join(data_path, "model/"),
                          SUMMARIES_DIR=os.path.join(data_path, "summary/"),
                          user_vocab=user_vocab,
                          item_vocab=item_vocab,
                          cate_vocab=cate_vocab,
                          need_sample=True,
                          train_num_ngs=train_num_ngs, # provides the number of negative instances for each positive instance for loss computation.
            )

#### 1.2 Create data loader
Designate a data iterator for the model. All our sequential models use SequentialIterator. 
data format is introduced aboved. 

<br>Validation and testing data are files after negative sampling offline with the number of `<num_ngs>` and `<test_num_ngs>`.

In [5]:
input_creator = SequentialIterator

## 2. Create model
When both hyper-parameters and data iterator are ready, we can create a model:

In [6]:
model = SLI_RECModel(hparams, input_creator, seed=RANDOM_SEED)

## sometimes we don't want to train a model from scratch
## then we can load a pre-trained model like this: 
#model.load_model(r'your_model_path')

Now let's see what is the model's performance at this point (without starting training):

In [7]:
print(model.run_eval(test_file, num_ngs=test_num_ngs)) # test_num_ngs is the number of negative lines after each positive line in your test_file

{'auc': 0.4908, 'logloss': 0.6931, 'mean_mrr': 0.2711, 'ndcg2': 0.4365, 'ndcg4': 0.4365, 'ndcg6': 0.4365, 'ndcg8': 0.4365, 'ndcg10': 0.4365, 'group_auc': 0.4896}


AUC=0.5 is a state of random guess. We can see that before training, the model behaves like random guessing.

#### 2.1 Train model
Next we want to train the model on a training set, and check the performance on a validation dataset. Training the model is as simple as a function call:

In [8]:
model = model.fit(train_file, valid_file, valid_num_ngs=valid_num_ngs) 
# valid_num_ngs is the number of negative lines after each positive line in your valid_file 
# we will evaluate the performance of model on valid_file every epoch

step 20 , total_loss: 1.6127, data_loss: 1.6127
step 40 , total_loss: 1.6054, data_loss: 1.6054
eval valid at epoch 1: auc:0.5005,logloss:0.6934,mean_mrr:0.4536,ndcg2:0.5874,ndcg4:0.5874,ndcg6:0.5874,ndcg8:0.5874,ndcg10:0.5874,group_auc:0.4999
step 20 , total_loss: 1.6017, data_loss: 1.6017
step 40 , total_loss: 1.5899, data_loss: 1.5899
eval valid at epoch 2: auc:0.5171,logloss:0.6931,mean_mrr:0.463,ndcg2:0.5946,ndcg4:0.5946,ndcg6:0.5946,ndcg8:0.5946,ndcg10:0.5946,group_auc:0.5109
step 20 , total_loss: 1.5589, data_loss: 1.5589
step 40 , total_loss: 1.4709, data_loss: 1.4709
eval valid at epoch 3: auc:0.6636,logloss:0.6642,mean_mrr:0.5743,ndcg2:0.6799,ndcg4:0.6799,ndcg6:0.6799,ndcg8:0.6799,ndcg10:0.6799,group_auc:0.6457
step 20 , total_loss: 1.3083, data_loss: 1.3083
step 40 , total_loss: 1.2443, data_loss: 1.2443
eval valid at epoch 4: auc:0.7143,logloss:0.6273,mean_mrr:0.6349,ndcg2:0.726,ndcg4:0.726,ndcg6:0.726,ndcg8:0.726,ndcg10:0.726,group_auc:0.7087
step 20 , total_loss: 1.2744, 

#### 2.2  Evaluate model

Again, let's see what is the model's performance now (after training):

In [9]:
res_syn = model.run_eval(test_file, num_ngs=test_num_ngs)
print(res_syn)
pm.record("res_syn", res_syn)

{'auc': 0.7184, 'logloss': 0.6533, 'mean_mrr': 0.4766, 'ndcg2': 0.6004, 'ndcg4': 0.6004, 'ndcg6': 0.6004, 'ndcg8': 0.6004, 'ndcg10': 0.6004, 'group_auc': 0.7036}


If we want to get the full prediction scores rather than evaluation metrics, we can do this:

In [10]:
model.predict(test_file, output_file)

<reco_utils.recommender.deeprec.models.sequential.sli_rec.SLI_RECModel at 0x241df67ba58>

In [11]:
# The data was downloaded in tmpdir folder. You can delete them manually if you do not need them any more.

Here are performances using all the amazon dataset among popular sequential models. 
<br>Settings for reproducing the results:
<br>`learning_rate=0.001, dropout=0.3, item_embedding_dim=32, cate_embedding_dim=8, l2_norm=0, batch_size=400`


| Models | AUC | g-AUC | NDCG@2 | NDCG@10 | config |
| :------| :------: | :------: | :------: | :------: | :------ |
| ASVD | 0.8251 | 0.8178 | 0.2922 | 0.4264| N/A|
| GRU4Rec | 0.8411 | 0.8332 | 0.3213 | 0.4547|max_seq_length=50, hidden_size=40|
| Caser | 0.8244 | 0.8171 | 0.283 | 0.4194| T=1, n_v=128, n_h=128, L=3, min_seq_length=5|
| SLi_Rec | 0.8631 | 0.8519 | 0.3491 | 0.4842| attention_size=40, max_seq_length=50, hidden_size=40|

 Note that the four models are grid searched with a coarse granularity and the results are for reference only. 

## Reference
\[1\] Zeping Yu, Jianxun Lian, Ahmad Mahmoody, Gongshen Liu, Xing Xie. Adaptive User Modeling with Long and Short-Term Preferences for Personailzed Recommendation. In Proceedings of the 28th International Joint Conferences on Artificial Intelligence, IJCAI’19, Pages 4213-4219. AAAI Press, 2019.</br>

\[2\] Balázs Hidasi, Alexandros Karatzoglou, Linas Baltrunas, Domonkos Tikk. Session-based Recommendations with Recurrent Neural Networks. ICLR (Poster) 2016

\[3\] Tang, Jiaxi, and Ke Wang. Personalized top-n sequential recommendation via convolutional sequence embedding. Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining. ACM, 2018.