[View in Colaboratory](https://colab.research.google.com/github/ylongqi/openrec/blob/master/tutorials/Temporal_aware_recommendation.ipynb)

<p align="center">
  <img src ="https://recsys.acm.org/wp-content/uploads/2017/07/recsys-18-small.png" height="40" /> <font size="4">Recsys 2018 Tutorial</font>
</p>
<p align="center">
  <font size="4"><b>Modularizing Deep Neural Network-Inspired Recommendation Algorithms</b></font>
</p>
<p align="center">
  <font size="4">Hands on: Temporal-aware recommendations</font>
</p>

Install OpenRec library
---



In [0]:
!pip install openrec


RNN-based temporal recommendation model (RNN-Rec)
---
The RNN-Rec model takes as inputs a list of items a user consumed and predicts the item that the user is likely to consume next.

<p align="center">
  <img src ="https://s3.amazonaws.com/cornell-tech-sdl-openrec/tutorials/rnn_rec.png" height="200" />
</p>

To implement such a model using OpenRec. One needs to first decide how this recommender should be decomposed into subgraphs, i.e., **inputgraph**, **usergraph**, **itemgraph**, **interactiongraph** and **optimizergraph**. For example, the training graph of RNN-Rec can be decomposed as follows.


<p align="center">
  <img src ="https://s3.amazonaws.com/cornell-tech-sdl-openrec/tutorials/rnn_rec_module.png" height="200" />
</p>

* **inputgraph**: item consumption history and the groundtruth label.
* **usergraph**: left as empty as no user-specific latent factor is needed.
* **itemgraph**: extract latent factors for items.
* **interactiongraph**: uses RNN and softmax to model user-item interactions.

After defining subgraphs, their interfaces and connections need to be specified. A sample specification of RNN-Rec can be as follows.

<p align="center">
  <img src ="https://s3.amazonaws.com/cornell-tech-sdl-openrec/tutorials/interface_connections.png" height="200" />
</p>

The serving graph of RNN-Rec needs to be defined similarly.

Define a RNN recommender in OpenRec
---



In [0]:
from openrec.recommenders import VanillaYouTubeRec
from openrec.modules.interactions import RNNSoftmax
import tensorflow as tf

def RNNRec(batch_size, dim_item_embed, max_seq_len, total_items, num_units, l2_reg=None,
    init_model_dir=None, save_model_dir='Recommender/', train=True, serve=False):
    
    ## By default reuse everything from VanillaYoutubeRec
    rec = VanillaYouTubeRec(batch_size=batch_size,
                            dim_item_embed=dim_item_embed,
                            max_seq_len=max_seq_len,
                            total_items=total_items,
                            l2_reg_embed=l2_reg,
                            init_model_dir=init_model_dir, 
                            save_model_dir=save_model_dir, 
                            train=train, serve=serve)
    
    ## [TODO] Please define input ports (Hint: using "ins" parameter)
    ## @rec.traingraph.interactiongraph(ins=[?FILL IN?])
    ## Answer:
    @rec.traingraph.interactiongraph(ins=['seq_item_vec', 'seq_len', 'label'])
    def f(subgraph):
        RNNSoftmax(seq_item_vec=subgraph['seq_item_vec'], 
                   seq_len=subgraph['seq_len'], 
                   num_units=num_units, 
                   total_items=total_items, 
                   label=subgraph['label'], 
                   train=True, 
                   subgraph=subgraph, 
                   scope='RNNSoftmax')
    
    ## [TODO] Please define input ports (Hint: using "ins" parameter)
    ## @rec.servegraph.interactiongraph(ins=[?FILL IN?])
    ## Answer:
    @rec.servegraph.interactiongraph(ins=['seq_item_vec', 'seq_len'])
    def f(subgraph):
        RNNSoftmax(seq_item_vec=subgraph['seq_item_vec'], 
                   seq_len=subgraph['seq_len'],
                   num_units=num_units, 
                   total_items=total_items, 
                   train=False, 
                   subgraph=subgraph, 
                   scope='RNNSoftmax')
    
    return rec

Training and testing the RNN-Rec model
---
* Download LastFM dataset




In [0]:
import requests

dataset_prefix = 'http://s3.amazonaws.com/cornell-tech-sdl-openrec'
r = requests.get('%s/lastfm/lastfm_test.npy' % dataset_prefix)
open('lastfm_test.npy', 'wb').write(r.content)
r = requests.get('%s/lastfm/lastfm_train.npy' % dataset_prefix)
open('lastfm_train.npy', 'wb').write(r.content)


* Load LastFM dataset



In [0]:
import numpy as np

lastfm_train = np.load('lastfm_train.npy')  # The structured Numpy array for training
lastfm_test = np.load('lastfm_test.npy')    # The structured Numpy array for testing

total_users = 992     # Total number of users in the dataset
total_items = 14598   # Total number of items in the dataset

* Inspect LastFM dataset

In [0]:
print('Keys in LastFM training data:', lastfm_train.dtype.names)
print('Number of training records:', len(lastfm_train))
print('Keys in LastFM testing data:', lastfm_test.dtype.names)
print('Number of testing records:', len(lastfm_test))

* Set values for hyperparameters of the RNN model



In [0]:
dim_item_embed = 50       # Dimensionality of the item embedding
max_seq_len = 100         # Maximum sequence length used for prediction
num_units = 32            # Number of units in the RNN model
batch_size = 256          # Training batch size
total_iter = 1000         # Total number of training iterations
eval_iter = 100           # Evaluate the model every eval_iter iterations
save_iter = total_iter    # Save the model every total_iter iterations

* Initiate datasets and data samplers for training and testing



In [0]:
from openrec.utils import Dataset
from openrec.utils.samplers import TemporalSampler
from openrec.utils.samplers import TemporalEvaluationSampler

train_dataset = Dataset(raw_data=lastfm_train, 
                        total_users=total_users,
                        total_items=total_items, 
                        sortby='ts', name='Train')
# "sortby" keyword is used to sort records based on timestamp

test_dataset = Dataset(raw_data=lastfm_test, 
                       total_users=total_users,
                       total_items=total_items, 
                       sortby='ts', name='Test')
# "sortby" keyword is used to sort records based on timestamp

train_sampler = TemporalSampler(batch_size=batch_size, 
                                max_seq_len=max_seq_len, 
                                dataset=train_dataset, 
                                num_process=1)
test_sampler = TemporalEvaluationSampler(dataset=test_dataset, 
                                         max_seq_len=max_seq_len)

* Instantiate a recommender and a corresponding model trainer



In [0]:
from openrec import ModelTrainer

rnn_model = RNNRec(batch_size=batch_size, 
                   dim_item_embed=dim_item_embed, 
                   max_seq_len=max_seq_len, 
                   total_items=train_dataset.total_items(), 
                   num_units=num_units, 
                   save_model_dir='rnn_recommender/', 
                   train=True, serve=True)

model_trainer = ModelTrainer(model=rnn_model)

* Define evaluators to be used for testing



In [0]:
from openrec.utils.evaluators import AUC, Recall

auc_evaluator = AUC()
recall_evaluator = Recall(recall_at=[100, 500]) 

* Start Training



In [0]:
model_trainer.train(total_iter=total_iter, 
                    eval_iter=eval_iter, 
                    save_iter=save_iter, 
                    train_sampler=train_sampler, 
                    eval_samplers=[test_sampler], 
                    evaluators=[auc_evaluator, recall_evaluator])