

<h1>NARM Demo on YOOCHOOSE Dataset</h1>

- **Introduction to Method**
- **Code**
    - **Data**
        - **Data Loading**
        - **Data Preprocessing**
        - **Data Loader**
    - **Model**
         - **Model Definition**
         - **Metrics for Method**
    - **Training, Validation and Testing**

## Introduction to Method

The method covered in this demo is <a href="https://arxiv.org/pdf/1711.04725.pdf" title="Neural Attenntive Session-based Recommendation">**Neural Attenntive Session-based Recommendation**</a>.


<html>
<head>
<meta name="viewport" content="width=device-width, initial-scale=1">
<style>
img {
  display: block;
  margin-left: auto;
  margin-right: auto;
}
</style>
</head>
<body>



<figure>
<img src="https://user-images.githubusercontent.com/34673511/188919387-1d514803-32fe-4b7a-907b-c3e41171776f.PNG" alt="" style="width:50%;>
<figcaption align = "center"></figcaption>
</figure>

</body>
</html>

In e-commerce scenarios where user profiles are invisible, session-based recommendations are proposed to generate recommendation results from short sessions. In previous works, only sequential behavior in the current session was considered, while the **user's main purpose** in the current session was neglected. The aim of this method is to propose a Neural Attentive Recommendation Machine (NARM) to solve this problem. To model the user's sequential behavior and capture the user's main purpose in the current session, **a hybrid encoder with an attention mechanism** is developed, which is later merged into a unified session representation. Using this unified session representation, the recommendation scores for each candidate item are computed. The NARM is trained by jointly learning item and session representations. NARM also performs significantly better on long sessions, which demonstrates its advantage of modeling the user's sequential behavior and main purpose simultaneously.

<html>
<head>
<meta name="viewport" content="width=device-width, initial-scale=1">
<style>
img {
  display: block;
  margin-left: auto;
  margin-right: auto;
}
</style>
</head>
<body>



<figure>
<img src="https://user-images.githubusercontent.com/34673511/188919520-d7588b06-edd9-4ade-aa4d-9ba66aa78646.png" alt="" style="width:80%;>
<figcaption align = "center"><b>Global recommenders model the user's whole sequential behavior to make recommendations, while local recommenders capture the user's primary motivation for making recommendations. Each recommender produces a recommendation score, which is displayed above the items. (b) shows the item in the red dashed box as being more relevant to the user's current intention. As the item's importance increases, the red line gets thicker.   </b></figcaption>
</figure>

</body>
</html>

# Code  

In [1]:
import time
import csv
import pickle
import operator
import datetime
import os
from tqdm import tqdm
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.nn.utils.rnn import pack_padded_sequence, pad_packed_sequence
from torch.utils.data import Dataset
import numpy as np
import random
from os.path import join
from torch.utils.data import DataLoader
import torch.optim as optim
from torch.optim.lr_scheduler import StepLR
from torch.autograd import Variable
from torch.backends import cudnn

## Data

In [2]:
path_to_data = "/ssd003/projects/aieng/public/recsys_datasets/yoochoose/"

As in the previous demo, we trained and evaluated this model using the **YOOCHOOSE dataset**. This dataset was released as part of the RecSys Challenge 2015. In this dataset, click-stream data is collected from an e-commerce site. There are 7981580 sessions and 37483 items left after filtering out sessions of length 1 and items that appear less than 5 times. In YOOCHOOSE, we used the sessions of the next day for testing and filtered out clicks from the test set where the clicked items did not appear in the training set.

### Data Loading

#### Add a header for YOOCHOOSE dataset

In [3]:
with open(path_to_data + 'yoochoose-clicks.dat', 'r') as f, open('../../yoochoose-clicks-withHeader.dat', 'w') as fn:
    fn.write('sessionId,timestamp,itemId,category'+'\n')
    for line in f:
        fn.write(line)

dataset = '../../yoochoose-clicks-withHeader.dat'

print("-- Starting @ %ss" % datetime.datetime.now())
with open(dataset, "r") as f:
    reader = csv.DictReader(f, delimiter=',')
    sess_clicks = {}
    sess_date = {}
    ctr = 0
    curid = -1
    curdate = None
    for data in tqdm(reader):
        sessid = data['sessionId']
        if curdate and not curid == sessid:
            date = time.mktime(time.strptime(curdate[:19], '%Y-%m-%dT%H:%M:%S'))
            sess_date[curid] = date
        curid = sessid
        item = data['itemId']
        curdate = data['timestamp']

        if sessid in sess_clicks:
            sess_clicks[sessid] += [item]
        else:
            sess_clicks[sessid] = [item]
        ctr += 1
    date = ''
    date = time.mktime(time.strptime(curdate[:19], '%Y-%m-%dT%H:%M:%S'))
    sess_date[curid] = date

-- Starting @ 2022-09-13 16:02:11.285843s


33003944it [05:03, 108893.82it/s]


#### Filter out sessions with length 1

In [4]:
for s in list(sess_clicks):
    if len(sess_clicks[s]) == 1:
        del sess_clicks[s]
        del sess_date[s]

#### Count number of times each item appears

In [5]:
iid_counts = {}
for s in sess_clicks:
    seq = sess_clicks[s]
    for iid in seq:
        if iid in iid_counts:
            iid_counts[iid] += 1
        else:
            iid_counts[iid] = 1

sorted_counts = sorted(iid_counts.items(), key=operator.itemgetter(1))

length = len(sess_clicks)
for s in list(sess_clicks):
    curseq = sess_clicks[s]
    filseq = list(filter(lambda i: iid_counts[i] >= 5, curseq))
    if len(filseq) < 2:
        del sess_clicks[s]
        del sess_date[s]
    else:
        sess_clicks[s] = filseq

#### Split out test set based on dates

In [6]:
dates = list(sess_date.items())
maxdate = dates[0][1]

for _, date in dates:
    if maxdate < date:
        maxdate = date

# 7 days for test
splitdate = maxdate - 86400 * 1  # the number of seconds for a day：86400


print('Splitting date', splitdate)
tra_sess = filter(lambda x: x[1] < splitdate, dates)
tes_sess = filter(lambda x: x[1] > splitdate, dates)

Splitting date 1411973999.0


#### Sort sessions by date

In [7]:
tra_sess = sorted(tra_sess, key=operator.itemgetter(1))     # [(sessionId, timestamp), (), ]
tes_sess = sorted(tes_sess, key=operator.itemgetter(1))     # [(sessionId, timestamp), (), ]
print(len(tra_sess))    # 186670    # 7966257
print(len(tes_sess))    # 15979     # 15324
print(tra_sess[:3])
print(tes_sess[:3])
print("-- Splitting train set and test set @ %ss" % datetime.datetime.now())

7966257
15324
[('171168', 1396335632.0), ('345618', 1396335675.0), ('263073', 1396335702.0)]
[('11532683', 1411974053.0), ('11464959', 1411974071.0), ('11296119', 1411974095.0)]
-- Splitting train set and test set @ 2022-09-13 16:08:12.520311s


#### Convert training sessions to sequences and renumber items to start from 1

In [8]:
item_dict = {}

def obtian_tra():
    train_ids = []
    train_seqs = []
    train_dates = []
    item_ctr = 1
    for s, date in tra_sess:
        seq = sess_clicks[s]
        outseq = []
        for i in seq:
            if i in item_dict:
                outseq += [item_dict[i]]
            else:
                outseq += [item_ctr]
                item_dict[i] = item_ctr
                item_ctr += 1
        if len(outseq) < 2:  # Doesn't occur
            continue
        train_ids += [s]
        train_dates += [date]
        train_seqs += [outseq]
    print(item_ctr)     # 43098, 37484
    return train_ids, train_dates, train_seqs

#### Convert test sessions to sequences, ignoring items that do not appear in training set

In [9]:
def obtian_tes():
    test_ids = []
    test_seqs = []
    test_dates = []
    for s, date in tes_sess:
        seq = sess_clicks[s]
        outseq = []
        for i in seq:
            if i in item_dict:
                outseq += [item_dict[i]]
        if len(outseq) < 2:
            continue
        test_ids += [s]
        test_dates += [date]
        test_seqs += [outseq]
    return test_ids, test_dates, test_seqs

In [10]:
tra_ids, tra_dates, tra_seqs = obtian_tra()
tes_ids, tes_dates, tes_seqs = obtian_tes()

37484


### Data Preprocessing

Since NARM was not trained in a session-parallel manner, a sequence splitting preprocess is required. For the input session [x1, x2, ..., xn−1, xn], we generated the sequences and labels ([x1],V(x2), ([x1, x2],V(x3), ..., ([x1, x2, ..., xn−1],V (xn)) for training on YOOCHOOSE. The corresponding label V(xi) represents the last click in the current session. Due to the size of YOOCHOOSE, we sorted the YOOCHOOSE training sequences by time and reported the results for the model trained on more recent fractions 1/64 and 1/4 of training sequences. Since the model is trained on more recent fractions, some items from the test set will not appear in the training set.

In [11]:
def process_seqs(iseqs, idates):
    out_seqs = []
    out_dates = []
    labs = []
    ids = []
    for id, seq, date in zip(range(len(iseqs)), iseqs, idates):
        for i in range(1, len(seq)):
            tar = seq[-i]
            labs += [tar]
            out_seqs += [seq[:-i]]
            out_dates += [date]
            ids += [id]
    return out_seqs, out_dates, labs, ids

In [12]:
tr_seqs, tr_dates, tr_labs, tr_ids = process_seqs(tra_seqs, tra_dates)
te_seqs, te_dates, te_labs, te_ids = process_seqs(tes_seqs, tes_dates)
tra = (tr_seqs, tr_labs)
tes = (te_seqs, te_labs)
print(len(tr_seqs))
print(len(te_seqs))
print(tr_seqs[:3], tr_dates[:3], tr_labs[:3])
print(te_seqs[:3], te_dates[:3], te_labs[:3])

all = 0

for seq in tra_seqs:
    all += len(seq)
for seq in tes_seqs:
    all += len(seq)
print('avg length: ', all/(len(tra_seqs) + len(tes_seqs) * 1.0))

if not os.path.exists('../../yoochoose'):
    os.makedirs('../../yoochoose')
if not os.path.exists('../../yoochoose1_4'):
    os.makedirs('../../yoochoose1_4')
if not os.path.exists('../../yoochoose1_64'):
    os.makedirs('../../yoochoose1_64')

pickle.dump(tes, open('../../yoochoose/test.txt', 'wb'))
pickle.dump(tes, open('../../yoochoose1_4/test.txt', 'wb'))
pickle.dump(tes, open('../../yoochoose1_64/test.txt', 'wb'))

split4, split64 = int(len(tr_seqs) / 4), int(len(tr_seqs) / 64)
print(len(tr_seqs[-split4:]))
print(len(tr_seqs[-split64:]))

tra4, tra64 = (tr_seqs[-split4:], tr_labs[-split4:]), (tr_seqs[-split64:], tr_labs[-split64:])
seq4, seq64 = tra_seqs[tr_ids[-split4]:], tra_seqs[tr_ids[-split64]:]

pickle.dump((tr_seqs, tr_labs), open('../../yoochoose/train.txt', 'wb'))
pickle.dump(tra_seqs, open('../../yoochoose/all_train_seq.txt', 'wb'))

pickle.dump(tra4, open('../../yoochoose1_4/train.txt', 'wb'))
pickle.dump(seq4, open('../../yoochoose1_4/all_train_seq.txt', 'wb'))

pickle.dump(tra64, open('../../yoochoose1_64/train.txt', 'wb'))
pickle.dump(seq64, open('../../yoochoose1_64/all_train_seq.txt', 'wb'))

23670982
55898
[[1], [3], [5, 5]] [1396335632.0, 1396335675.0, 1396335702.0] [2, 4, 5]
[[33611, 37169, 6409], [33611, 37169], [33611]] [1411974053.0, 1411974053.0, 1411974053.0] [33128, 6409, 37169]
avg length:  3.9727042800167034
5917745
369859


### Data Loader

In [13]:
def collate_fn(data):
    """This function will be used to pad the sessions to max length
       in the batch and transpose the batch from 
       batch_size x max_seq_len to max_seq_len x batch_size.
       It will return padded vectors, labels and lengths of each session (before padding)
       It will be used in the Dataloader
    """
    data.sort(key=lambda x: len(x[0]), reverse=True)
    lens = [len(sess) for sess, label in data]
    labels = []
    padded_sesss = torch.zeros(len(data), max(lens)).long()
    for i, (sess, label) in enumerate(data):
        padded_sesss[i,:lens[i]] = torch.LongTensor(sess)
        labels.append(label)
    
    padded_sesss = padded_sesss.transpose(0,1)
    return padded_sesss, torch.tensor(labels).long(), lens

In [14]:
def load_data(root, valid_portion=0.1, maxlen=19, sort_by_len=False):
    '''Loads the dataset

    :type path: String
    :param path: The path to the dataset (here RSC2015)
    :type n_items: int
    :param n_items: The number of items.
    :type valid_portion: float
    :param valid_portion: The proportion of the full train set used for
        the validation set.
    :type maxlen: None or positive int
    :param maxlen: the max sequence length we use in the train/valid set.
    :type sort_by_len: bool
    :name sort_by_len: Sort by the sequence length for the train,
        valid and test set. This allow faster execution as it cause
        less padding per minibatch. Another mechanism must be used to
        shuffle the train set at each epoch.

    '''

    # Load the dataset
    path_train_data = root + 'train.txt'
    path_test_data = root + 'test.txt'
    with open(path_train_data, 'rb') as f1:
        train_set = pickle.load(f1)

    with open(path_test_data, 'rb') as f2:
        test_set = pickle.load(f2)

    if maxlen:
        new_train_set_x = []
        new_train_set_y = []
        for x, y in zip(train_set[0], train_set[1]):
            if len(x) < maxlen:
                new_train_set_x.append(x)
                new_train_set_y.append(y)
            else:
                new_train_set_x.append(x[:maxlen])
                new_train_set_y.append(y)
        train_set = (new_train_set_x, new_train_set_y)
        del new_train_set_x, new_train_set_y

        new_test_set_x = []
        new_test_set_y = []
        for xx, yy in zip(test_set[0], test_set[1]):
            if len(xx) < maxlen:
                new_test_set_x.append(xx)
                new_test_set_y.append(yy)
            else:
                new_test_set_x.append(xx[:maxlen])
                new_test_set_y.append(yy)
        test_set = (new_test_set_x, new_test_set_y)
        del new_test_set_x, new_test_set_y

    # split training set into validation set
    train_set_x, train_set_y = train_set
    n_samples = len(train_set_x)
    sidx = np.arange(n_samples, dtype='int32')
    np.random.shuffle(sidx)
    n_train = int(np.round(n_samples * (1. - valid_portion)))
    valid_set_x = [train_set_x[s] for s in sidx[n_train:]]
    valid_set_y = [train_set_y[s] for s in sidx[n_train:]]
    train_set_x = [train_set_x[s] for s in sidx[:n_train]]
    train_set_y = [train_set_y[s] for s in sidx[:n_train]]

    (test_set_x, test_set_y) = test_set

    def len_argsort(seq):
        return sorted(range(len(seq)), key=lambda x: len(seq[x]))

    if sort_by_len:
        sorted_index = len_argsort(test_set_x)
        test_set_x = [test_set_x[i] for i in sorted_index]
        test_set_y = [test_set_y[i] for i in sorted_index]

        sorted_index = len_argsort(valid_set_x)
        valid_set_x = [valid_set_x[i] for i in sorted_index]
        valid_set_y = [valid_set_y[i] for i in sorted_index]

    train = (train_set_x, train_set_y)
    valid = (valid_set_x, valid_set_y)
    test = (test_set_x, test_set_y)

    return train, valid, test


class RecSysDataset(Dataset):
    """define the pytorch Dataset class for yoochoose datasets.
    """
    def __init__(self, data):
        self.data = data
        print('-'*50)
        print('Dataset info:')
        print('Number of sessions: {}'.format(len(data[0])))
        print('-'*50)
        
    def __getitem__(self, index):
        session_items = self.data[0][index]
        target_item = self.data[1][index]
        return session_items, target_item

    def __len__(self):
        return len(self.data[0])

## Model

<html>
<head>
<meta name="viewport" content="width=device-width, initial-scale=1">
<style>
img {
  display: block;
  margin-left: auto;
  margin-right: auto;
}
</style>
</head>
<body>



<figure>
<img src="https://user-images.githubusercontent.com/34673511/188940697-319e650c-bfe8-4113-9d7c-a236fde7c8f2.png" alt="" style="width:100%;>
<figcaption align = "center"></figcaption>
</figure>

</body>
</html>

The global encoder and the local encoder in NARM.

<html>
<head>
<meta name="viewport" content="width=device-width, initial-scale=1">
<style>
img {
  display: block;
  margin-left: auto;
  margin-right: auto;
}
</style>
</head>
<body>



<figure>
<img src="https://user-images.githubusercontent.com/34673511/188919605-ab0abfb5-cdb1-4579-9189-b4a4f75bcd7e.PNG" alt="" style="width:80%;>
<figcaption align = "center"></figcaption>
</figure>

</body>
</html>

The graphical model of NARM, where the session feature $c_{t}$ is represented by the concatenation of the vectors $c_{t}^{g}$ and $c_{t}^{l}$. Note that $h_{t}^{g}$ and $h_{t}^{l}$ have different roles, but the same values. The last hidden state of the global encoder $h_{t}^{g}$ plays a role in the encoding of the entire input clicks, while the last hidden state of the local encoder $h_{t}^{l}$ calculates attention weights based on previous hidden states.

### Model Definition

In [15]:
class NARM(nn.Module):
    """Neural Attentive Session Based Recommendation Model Class

    Args:
        n_items(int): the number of items
        hidden_size(int): the hidden size of gru
        embedding_dim(int): the dimension of item embedding
        batch_size(int): 
        n_layers(int): the number of gru layers

    """
    def __init__(self, n_items, hidden_size, embedding_dim, batch_size, n_layers = 1):
        super(NARM, self).__init__()
        self.n_items = n_items
        self.hidden_size = hidden_size
        self.batch_size = batch_size
        self.n_layers = n_layers
        self.embedding_dim = embedding_dim
        self.emb = nn.Embedding(self.n_items, self.embedding_dim, padding_idx = 0)
        self.emb_dropout = nn.Dropout(0.25)
        self.gru = nn.GRU(self.embedding_dim, self.hidden_size, self.n_layers)
        self.a_1 = nn.Linear(self.hidden_size, self.hidden_size, bias=False)
        self.a_2 = nn.Linear(self.hidden_size, self.hidden_size, bias=False)
        self.v_t = nn.Linear(self.hidden_size, 1, bias=False)
        self.ct_dropout = nn.Dropout(0.5)
        self.b = nn.Linear(self.embedding_dim, 2 * self.hidden_size, bias=False)

        self.device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

    def forward(self, seq, lengths):
        hidden = self.init_hidden(seq.size(1))
        embs = self.emb_dropout(self.emb(seq))
        embs = pack_padded_sequence(embs, lengths)
        gru_out, hidden = self.gru(embs, hidden)
        gru_out, lengths = pad_packed_sequence(gru_out)

        # fetch the last hidden state of last timestamp
        ht = hidden[-1]
        gru_out = gru_out.permute(1, 0, 2)

        c_global = ht
        q1 = self.a_1(gru_out.contiguous().view(-1, self.hidden_size)).view(gru_out.size())  
        q2 = self.a_2(ht)

        mask = torch.where(seq.permute(1, 0) > 0, torch.tensor([1.], device = self.device), torch.tensor([0.], device = self.device))
        q2_expand = q2.unsqueeze(1).expand_as(q1)
        q2_masked = mask.unsqueeze(2).expand_as(q1) * q2_expand

        alpha = self.v_t(torch.sigmoid(q1 + q2_masked).view(-1, self.hidden_size)).view(mask.size())
        c_local = torch.sum(alpha.unsqueeze(2).expand_as(gru_out) * gru_out, 1)

        c_t = torch.cat([c_local, c_global], 1)
        c_t = self.ct_dropout(c_t)
        
        item_embs = self.emb(torch.arange(self.n_items).to(self.device))
        scores = torch.matmul(c_t, self.b(item_embs).permute(1, 0))

        return scores

    def init_hidden(self, batch_size):
        return torch.zeros((self.n_layers, batch_size, self.hidden_size), requires_grad=True).to(self.device)

We can evaluate the method with the following metrics:

- **Recall at K (Recall@K)**: It is the proportion of cases when the desired item is amongst the top-k items in all test cases (a test example receives a score of 1 when the nth item appears, and 0 otherwise).

<html>
<head>
<meta name="viewport" content="width=device-width, initial-scale=1">
<style>
img {
  display: block;
  margin-left: auto;
  margin-right: auto;
}
</style>
</head>
<body>



<figure>
<img src="https://miro.medium.com/max/1400/1*4idLDQc9FiyCMXy8Ck-LSA.png" alt="" width="300" height="200";>
<figcaption align = "center" >
</figure>

</body>
</html>

- **Mean Reciprocal Rank at K (MRR@K)**: For all test cases, it is the average of reciprocal ranks of the desire items. The reciprocal rank is set to zero if the rank is larger than k.

<html>
<head>
<meta name="viewport" content="width=device-width, initial-scale=1">
<style>
img {
  display: block;
  margin-left: auto;
  margin-right: auto;
}
</style>
</head>
<body>



<figure>
<img src="https://miro.medium.com/max/872/1*Yz8One3GN-vJfQxy6n2rAw.png" alt="" width="220" height="">
<figcaption align = "center" >
</figure>

</body>
</html>

### Metrics for Method

In [16]:
def get_recall(indices, targets):
    """
    Calculates the recall score for the given predictions and targets

    Args:
        indices (Bxk): torch.LongTensor. top-k indices predicted by the model.
        targets (B): torch.LongTensor. actual target indices.

    Returns:
        recall (float): the recall score
    """

    targets = targets.view(-1, 1).expand_as(indices)
    hits = (targets == indices).nonzero()
    if len(hits) == 0:
        return 0
    n_hits = (targets == indices).nonzero()[:, :-1].size(0)
    recall = float(n_hits) / targets.size(0)
    return recall


def get_mrr(indices, targets):
    """
    Calculates the MRR score for the given predictions and targets
    Args:
        indices (Bxk): torch.LongTensor. top-k indices predicted by the model.
        targets (B): torch.LongTensor. actual target indices.

    Returns:
        mrr (float): the mrr score
    """

    tmp = targets.view(-1, 1)
    targets = tmp.expand_as(indices)
    hits = (targets == indices).nonzero()
    ranks = hits[:, -1] + 1
    ranks = ranks.float()
    rranks = torch.reciprocal(ranks)
    mrr = torch.sum(rranks).data / targets.size(0)
    return mrr.item()


def evaluate(indices, targets, k=20):
    """
    Evaluates the model using Recall@K, MRR@K scores.

    Args:
        logits (B,C): torch.LongTensor. The predicted logit for the next items.
        targets (B): torch.LongTensor. actual target indices.

    Returns:
        recall (float): the recall score
        mrr (float): the mrr score
    """
    _, indices = torch.topk(indices, k, -1)
    recall = get_recall(indices, targets)
    mrr = get_mrr(indices, targets)
    return recall, mrr

## Training, Validation and Testing

In [17]:
def trainForEpoch(train_loader, model, optimizer, epoch, num_epochs, criterion, log_aggr=1):
    model.train()

    sum_epoch_loss = 0

    start = time.time()
    for i, (seq, target, lens) in tqdm(enumerate(train_loader), total=len(train_loader)):
        seq = seq.to(device)
        target = target.to(device)
        
        optimizer.zero_grad()
        outputs = model(seq, lens)
        loss = criterion(outputs, target)
        loss.backward()
        optimizer.step() 

        loss_val = loss.item()
        sum_epoch_loss += loss_val

        iter_num = epoch * len(train_loader) + i + 1

        if i % log_aggr == 0:
            print('[TRAIN] epoch %d/%d batch loss: %.4f (avg %.4f) (%.2f im/s)'
                % (epoch, num_epochs, loss_val, sum_epoch_loss / (i + 1),
                  len(seq) / (time.time() - start)))

        start = time.time()


def validate(valid_loader, model):
    model.eval()
    recalls = []
    mrrs = []
    with torch.no_grad():
        for seq, target, lens in tqdm(valid_loader):
            seq = seq.to(device)
            target = target.to(device)
            outputs = model(seq, lens)
            logits = F.softmax(outputs, dim = 1)
            recall, mrr = evaluate(logits, target, k = topk)
            recalls.append(recall)
            mrrs.append(mrr)
    
    mean_recall = np.mean(recalls)
    mean_mrr = np.mean(mrrs)
    return mean_recall, mean_mrr

### Train on all training sequences of YOOCHOOSE

In [19]:
dataset_path = "../../yoochoose/"
valid_portion = 0.1
batch_size = 512
hidden_size = 100
embed_dim = 50
epoch = 10
lr = 0.001
lr_dc = 0.1
lr_dc_step = 80
topk = 20

In [20]:
print('Loading data...')
train, valid, test = load_data(dataset_path, valid_portion=valid_portion)

train_data = RecSysDataset(train)
valid_data = RecSysDataset(valid)
test_data = RecSysDataset(test)
train_loader = DataLoader(train_data, batch_size = batch_size, shuffle = True, collate_fn = collate_fn)
valid_loader = DataLoader(valid_data, batch_size = batch_size, shuffle = False, collate_fn = collate_fn)
test_loader = DataLoader(test_data, batch_size = batch_size, shuffle = False, collate_fn = collate_fn)

n_items = 37484
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

model = NARM(n_items, hidden_size, embed_dim, batch_size).to(device)

optimizer = optim.Adam(model.parameters(), lr)
criterion = nn.CrossEntropyLoss()
scheduler = StepLR(optimizer, step_size = lr_dc_step, gamma = lr_dc)

Loading data...
--------------------------------------------------
Dataset info:
Number of sessions: 21303884
--------------------------------------------------
--------------------------------------------------
Dataset info:
Number of sessions: 2367098
--------------------------------------------------
--------------------------------------------------
Dataset info:
Number of sessions: 55898
--------------------------------------------------


In [None]:
for epoch in tqdm(range(1, epoch + 1)):
    # train for one epoch
    scheduler.step(epoch = epoch)
    trainForEpoch(train_loader, model, optimizer, epoch, epoch, criterion, log_aggr = 200)

    recall, mrr = validate(valid_loader, model)
    print('Epoch {} validation: Recall@{}: {:.4f}, MRR@{}: {:.4f} \n'.format(epoch, topk, recall, topk, mrr))

    # store best loss and save a model checkpoint
    ckpt_dict = {
        'epoch': epoch,
        'state_dict': model.state_dict(),
        'optimizer': optimizer.state_dict()
    }

    torch.save(ckpt_dict, '../../latest_checkpoint_yoochoose.pth')

In [21]:
ckpt = torch.load('/ssd003/projects/aieng/public/recsys_ckpts/latest_checkpoint_yoochoose.pth')
model.load_state_dict(ckpt['state_dict'])
recall, mrr = validate(test_loader, model)
print("Test: Recall@{}: {:.4f}, MRR@{}: {:.4f}".format(topk, recall, topk, mrr))

100%|██████████| 110/110 [00:02<00:00, 41.59it/s]


Test: Recall@20: 0.6726, MRR@20: 0.2774


<h2> Result Compared to baselines </h2>

| Methods/Metrics      | Recall@20 | MRR@20     |
| :---        |    :----:   |          ---: |
| POP      | 0.0050       | 0.0012   |
| S-POP   | 0.2672        | 0.1775      |
| BPR-MF   | 0.2574        | 0.0618      |
| Item-KNN      | 0.5065       | 0.2048   |
| GRU4Rec   | **0.719952**        | **0.316040**      |
| **NARM**   | 0.6726        | 0.2774      |

### Train on the 1/4 of training sequences of YOOCHOOSE

In [22]:
dataset_path = "../../yoochoose1_4/"
valid_portion = 0.1
batch_size = 512
hidden_size = 100
embed_dim = 50
epoch = 100
lr = 0.001
lr_dc = 0.1
lr_dc_step = 80
topk = 20

In [23]:
print('Loading data...')
train, valid, test = load_data(dataset_path, valid_portion=valid_portion)

train_data = RecSysDataset(train)
valid_data = RecSysDataset(valid)
test_data = RecSysDataset(test)
train_loader = DataLoader(train_data, batch_size = batch_size, shuffle = True, collate_fn = collate_fn)
valid_loader = DataLoader(valid_data, batch_size = batch_size, shuffle = False, collate_fn = collate_fn)
test_loader = DataLoader(test_data, batch_size = batch_size, shuffle = False, collate_fn = collate_fn)

n_items = 37484
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

model = NARM(n_items, hidden_size, embed_dim, batch_size).to(device)

optimizer = optim.Adam(model.parameters(), lr)
criterion = nn.CrossEntropyLoss()
scheduler = StepLR(optimizer, step_size = lr_dc_step, gamma = lr_dc)

Loading data...
--------------------------------------------------
Dataset info:
Number of sessions: 5325970
--------------------------------------------------
--------------------------------------------------
Dataset info:
Number of sessions: 591775
--------------------------------------------------
--------------------------------------------------
Dataset info:
Number of sessions: 55898
--------------------------------------------------


In [None]:
for epoch in tqdm(range(1, epoch + 1)):
    # train for one epoch
    scheduler.step(epoch = epoch)
    trainForEpoch(train_loader, model, optimizer, epoch, epoch, criterion, log_aggr = 200)

    recall, mrr = validate(valid_loader, model)
    print('Epoch {} validation: Recall@{}: {:.4f}, MRR@{}: {:.4f} \n'.format(epoch, topk, recall, topk, mrr))

    # store best loss and save a model checkpoint
    ckpt_dict = {
        'epoch': epoch,
        'state_dict': model.state_dict(),
        'optimizer': optimizer.state_dict()
    }

    torch.save(ckpt_dict, '../../latest_checkpoint_yoochoose1_4.pth')

In [24]:
ckpt = torch.load('/ssd003/projects/aieng/public/recsys_ckpts/latest_checkpoint_yoochoose1_4.pth')
model.load_state_dict(ckpt['state_dict'])
recall, mrr = validate(test_loader, model)
print("Test: Recall@{}: {:.4f}, MRR@{}: {:.4f}".format(topk, recall, topk, mrr))

100%|██████████| 110/110 [00:01<00:00, 56.45it/s]

Test: Recall@20: 0.7042, MRR@20: 0.2994





<h2> Result Compared to baselines </h2>

| Methods/Metrics      | Recall@20 | MRR@20     |
| :---        |    :----:   |          ---: |
| POP      | 0.133       | 0.030   |
| S-POP   |  0.2708         | 0.1775      |
| BPR-MF   |  0.340        | 0.2170      |
| Item-KNN      | 0.5231        | 0.157     |
| GRU4Rec   | 0.5953         | 0.2260      |
| **NARM**   | **0.7042**         | **0.2994**      |

### Train on the 1/64 of training sequences of YOOCHOOSE

In [25]:
dataset_path = "../../yoochoose1_64/"
valid_portion = 0.1
batch_size = 512
hidden_size = 100
embed_dim = 50
epoch = 100
lr = 0.001
lr_dc = 0.1
lr_dc_step = 80
topk = 20

In [26]:
print('Loading data...')
train, valid, test = load_data(dataset_path, valid_portion=valid_portion)

train_data = RecSysDataset(train)
valid_data = RecSysDataset(valid)
test_data = RecSysDataset(test)
train_loader = DataLoader(train_data, batch_size = batch_size, shuffle = True, collate_fn = collate_fn)
valid_loader = DataLoader(valid_data, batch_size = batch_size, shuffle = False, collate_fn = collate_fn)
test_loader = DataLoader(test_data, batch_size = batch_size, shuffle = False, collate_fn = collate_fn)

n_items = 37484
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

model = NARM(n_items, hidden_size, embed_dim, batch_size).to(device)

optimizer = optim.Adam(model.parameters(), lr)
criterion = nn.CrossEntropyLoss()
scheduler = StepLR(optimizer, step_size = lr_dc_step, gamma = lr_dc)

Loading data...
--------------------------------------------------
Dataset info:
Number of sessions: 332873
--------------------------------------------------
--------------------------------------------------
Dataset info:
Number of sessions: 36986
--------------------------------------------------
--------------------------------------------------
Dataset info:
Number of sessions: 55898
--------------------------------------------------


In [None]:
for epoch in tqdm(range(1, epoch + 1)):
    # train for one epoch
    scheduler.step(epoch = epoch)
    trainForEpoch(train_loader, model, optimizer, epoch, epoch, criterion, log_aggr = 200)

    recall, mrr = validate(valid_loader, model)
    print('Epoch {} validation: Recall@{}: {:.4f}, MRR@{}: {:.4f} \n'.format(epoch, topk, recall, topk, mrr))

    # store best loss and save a model checkpoint
    ckpt_dict = {
        'epoch': epoch,
        'state_dict': model.state_dict(),
        'optimizer': optimizer.state_dict()
    }

    torch.save(ckpt_dict, '../../latest_checkpoint_yoochoose1_64.pth')

In [27]:
ckpt = torch.load('/ssd003/projects/aieng/public/recsys_ckpts/latest_checkpoint_yoochoose1_64.pth')
model.load_state_dict(ckpt['state_dict'])
recall, mrr = validate(test_loader, model)
print("Test: Recall@{}: {:.4f}, MRR@{}: {:.4f}".format(topk, recall, topk, mrr))

100%|██████████| 110/110 [00:01<00:00, 58.99it/s]

Test: Recall@20: 0.6888, MRR@20: 0.2936





<h2> Result Compared to baselines </h2>

| Methods/Metrics      | Recall@20 | MRR@20     |
| :---        |    :----:   |          ---: |
| POP      |  0.671        | 0.165   |
| S-POP   |   0.3044          | 0.1835      |
| BPR-MF   |  0.340        | 0.2170      |
| Item-KNN      |  0.5160         | 0.2181     |
| GRU4Rec   |  0.6064          | 0.2289      |
| **NARM**   | **0.6888**         | **0.2936**      |

# References

1. <a href="https://arxiv.org/pdf/1711.04725.pdf" title="Neural Attenntive Session-based Recommendation">Li, Jing, et al. Neural Attenntive Session-based Recommendation.</a>
2. https://github.com/Wang-Shuo/Neural-Attentive-Session-Based-Recommendation-PyTorch