<a href="https://colab.research.google.com/github/witheunjin/What-Did-I-Study-Today/blob/main/210709_NCF.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Section 1: interact_status

In [1]:
import pandas as pd
import numpy as np
import random

data = [[1,11,2,13142],[1,12,5,24132],[2,21,3,35123],[2,22,4,22121],[2,23,1,23111],[3,31,2,11312],[3,32,1,13412]]
df = pd.DataFrame(data)
df.columns = ['userId', 'movieId', 'rating', 'timestamp']
print("[Data]")
print(df)
print('\n')
item_pool = set(df['movieId'].unique())

#NCF-PT_ejlee
interact_status = df.groupby('userId')['movieId'].apply(set).reset_index().rename (columns = {'movieId': 'interacted_items'})
print("[Interact_status: A]")
print(interact_status)
print('\n')
print("[Interact_status: B]")
interact_status['negative_items'] = interact_status['interacted_items'].apply(lambda x: item_pool - x)
print(interact_status)
print('\n')
print("[Interact_status: C]")
interact_status['negative_samples'] = interact_status['negative_items'].apply(lambda x: random.sample(x, 3))
print(interact_status)


[Data]
   userId  movieId  rating  timestamp
0       1       11       2      13142
1       1       12       5      24132
2       2       21       3      35123
3       2       22       4      22121
4       2       23       1      23111
5       3       31       2      11312
6       3       32       1      13412


[Interact_status: A]
   userId interacted_items
0       1         {11, 12}
1       2     {21, 22, 23}
2       3         {32, 31}


[Interact_status: B]
   userId interacted_items        negative_items
0       1         {11, 12}  {32, 21, 22, 23, 31}
1       2     {21, 22, 23}      {32, 11, 12, 31}
2       3         {32, 31}  {11, 12, 21, 22, 23}


[Interact_status: C]
   userId interacted_items        negative_items negative_samples
0       1         {11, 12}  {32, 21, 22, 23, 31}     [31, 22, 23]
1       2     {21, 22, 23}      {32, 11, 12, 31}     [32, 31, 11]
2       3         {32, 31}  {11, 12, 21, 22, 23}     [22, 21, 11]


interated_items: 시청한 영화리스트

negative_items: 시청하지 않은 영화리스트

실제는 99개의 샘플링을 하지만, 여기서는 예시 데이터가 적기 때문에 3개만 샘플링 -> negative_samples


# Section 2: train


In [2]:
df['rank_latest'] = df.groupby(['userId'])['timestamp'].rank(method='first', ascending=False)
print(df)
print('\n')
print("[Train Data]")
train = df[df['rank_latest']>1]
print(train)

   userId  movieId  rating  timestamp  rank_latest
0       1       11       2      13142          2.0
1       1       12       5      24132          1.0
2       2       21       3      35123          1.0
3       2       22       4      22121          3.0
4       2       23       1      23111          2.0
5       3       31       2      11312          2.0
6       3       32       1      13412          1.0


[Train Data]
   userId  movieId  rating  timestamp  rank_latest
0       1       11       2      13142          2.0
3       2       22       4      22121          3.0
4       2       23       1      23111          2.0
5       3       31       2      11312          2.0


1. userId로 group화한 후 timestamp 기준으로 랭킹

2. rank_latest 값이 1보다 크면 train 데이터로 분류 (else, test data)

# Section 3: train_rating

In [3]:
train_rating = pd.merge(train,interact_status[['userId','negative_items']], on='userId')
print("[Train + Interact_status]")
print(train_rating)

[Train + Interact_status]
   userId  movieId  rating  timestamp  rank_latest        negative_items
0       1       11       2      13142          2.0  {32, 21, 22, 23, 31}
1       2       22       4      22121          3.0      {32, 11, 12, 31}
2       2       23       1      23111          2.0      {32, 11, 12, 31}
3       3       31       2      11312          2.0  {11, 12, 21, 22, 23}


`train data`와 `interact_status`를 합침 = `train_rating`

# Section 4: Sampling (num_negatives)

In [6]:
train_rating['negatives'] = train_rating['negative_items'].apply(lambda x: random.sample(x, 4))
print(train_rating)

   userId  movieId  rating  ...  rank_latest        negative_items         negatives
0       1       11       2  ...          2.0  {32, 21, 22, 23, 31}  [31, 32, 23, 21]
1       2       22       4  ...          3.0      {32, 11, 12, 31}  [12, 11, 31, 32]
2       2       23       1  ...          2.0      {32, 11, 12, 31}  [11, 31, 32, 12]
3       3       31       2  ...          2.0  {11, 12, 21, 22, 23}  [23, 12, 22, 21]

[4 rows x 7 columns]


train_rating 에서 num_negative(here, 4)만큼 negative sample 뽑음

# Section 5: Training

In [8]:
for row in train_rating.itertuples(index=False):
  print(row)

Pandas(userId=1, movieId=11, rating=2, timestamp=13142, rank_latest=2.0, negative_items={32, 21, 22, 23, 31}, negatives=[31, 32, 23, 21])
Pandas(userId=2, movieId=22, rating=4, timestamp=22121, rank_latest=3.0, negative_items={32, 11, 12, 31}, negatives=[12, 11, 31, 32])
Pandas(userId=2, movieId=23, rating=1, timestamp=23111, rank_latest=2.0, negative_items={32, 11, 12, 31}, negatives=[11, 31, 32, 12])
Pandas(userId=3, movieId=31, rating=2, timestamp=11312, rank_latest=2.0, negative_items={11, 12, 21, 22, 23}, negatives=[23, 12, 22, 21])


In [17]:
users, items, ratings = [], [], []
for row in train_rating.itertuples():
  users.append(int(row.userId))
  items.append(int(row.movieId))
  ratings.append(int(float(row.rating)))
  for i in range(4):
    users.append(int(row.userId))
    items.append(int(row.negatives[i]))
    ratings.append(float(0))
print('[RESULT]')
print('users: {}'.format(users))
print('items: {}'.format(items))
print('ratings: {}'.format(ratings))

[RESULT]
users: [1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3]
items: [11, 31, 32, 23, 21, 22, 12, 11, 31, 32, 23, 11, 31, 32, 12, 31, 23, 12, 22, 21]
ratings: [2, 0.0, 0.0, 0.0, 0.0, 4, 0.0, 0.0, 0.0, 0.0, 1, 0.0, 0.0, 0.0, 0.0, 2, 0.0, 0.0, 0.0, 0.0]


In [18]:
import torch
from torch.utils.data import DataLoader, Dataset

class UserItemRatingDataset(Dataset):
  def __init__(self, user_tensor, item_tensor, target_tensor):
    self.user_tensor = user_tensor
    self.item_tensor = item_tensor
    self.target_tensor = target_tensor
  def __getitem__(self, index):
    return self.user_tensor[index], self.item_tensor[index], self.target_tensor[index]
  def __len__(self):
    return self.user_tensor.size(0)

In [19]:
dataset = UserItemRatingDataset(user_tensor = torch.LongTensor(users), 
                                item_tensor = torch.LongTensor(items), 
                                target_tensor = torch.FloatTensor(ratings))
#shuffle = True
for batch_id, batch in enumerate(DataLoader(dataset, batch_size=1, shuffle=False)):
  user, item, rating = batch[0], batch[1], batch[2]
  print('user: {}, item: {}, rating: {}'.format(user,item,rating))

user: tensor([1]), item: tensor([11]), rating: tensor([2.])
user: tensor([1]), item: tensor([31]), rating: tensor([0.])
user: tensor([1]), item: tensor([32]), rating: tensor([0.])
user: tensor([1]), item: tensor([23]), rating: tensor([0.])
user: tensor([1]), item: tensor([21]), rating: tensor([0.])
user: tensor([2]), item: tensor([22]), rating: tensor([4.])
user: tensor([2]), item: tensor([12]), rating: tensor([0.])
user: tensor([2]), item: tensor([11]), rating: tensor([0.])
user: tensor([2]), item: tensor([31]), rating: tensor([0.])
user: tensor([2]), item: tensor([32]), rating: tensor([0.])
user: tensor([2]), item: tensor([23]), rating: tensor([1.])
user: tensor([2]), item: tensor([11]), rating: tensor([0.])
user: tensor([2]), item: tensor([31]), rating: tensor([0.])
user: tensor([2]), item: tensor([32]), rating: tensor([0.])
user: tensor([2]), item: tensor([12]), rating: tensor([0.])
user: tensor([3]), item: tensor([31]), rating: tensor([2.])
user: tensor([3]), item: tensor([23]), r

In [20]:
#shuffle = True
for batch_id, batch in enumerate(DataLoader(dataset, batch_size=3, shuffle=False)):
  user, item, rating = batch[0], batch[1], batch[2]
  print('user: {}, item: {}, rating: {}'.format(user,item,rating))

user: tensor([1, 1, 1]), item: tensor([11, 31, 32]), rating: tensor([2., 0., 0.])
user: tensor([1, 1, 2]), item: tensor([23, 21, 22]), rating: tensor([0., 0., 4.])
user: tensor([2, 2, 2]), item: tensor([12, 11, 31]), rating: tensor([0., 0., 0.])
user: tensor([2, 2, 2]), item: tensor([32, 23, 11]), rating: tensor([0., 1., 0.])
user: tensor([2, 2, 2]), item: tensor([31, 32, 12]), rating: tensor([0., 0., 0.])
user: tensor([3, 3, 3]), item: tensor([31, 23, 12]), rating: tensor([2., 0., 0.])
user: tensor([3, 3]), item: tensor([22, 21]), rating: tensor([0., 0.])


In [21]:
for batch_id, batch in enumerate(DataLoader(dataset, batch_size=3, shuffle=True)):
  user, item, rating = batch[0], batch[1], batch[2]
  print('user: {}, item: {}, rating: {}'.format(user,item,rating))

user: tensor([1, 2, 2]), item: tensor([21, 11, 22]), rating: tensor([0., 0., 4.])
user: tensor([2, 2, 1]), item: tensor([12, 11, 23]), rating: tensor([0., 0., 0.])
user: tensor([1, 3, 2]), item: tensor([31, 31, 23]), rating: tensor([0., 2., 1.])
user: tensor([2, 2, 1]), item: tensor([32, 31, 11]), rating: tensor([0., 0., 2.])
user: tensor([2, 2, 3]), item: tensor([31, 12, 22]), rating: tensor([0., 0., 0.])
user: tensor([1, 3, 2]), item: tensor([32, 21, 32]), rating: tensor([0., 0., 0.])
user: tensor([3, 3]), item: tensor([12, 23]), rating: tensor([0., 0.])


# NCF
## NCF_official
```python3
for epoch in xrange(num_epochs):
  t1 = time()
  user_input, item_input, labels = get_train_instances(train, num_negatives)
  #Training
  hist = model.fit([np.array(user_input), np.array(item_input)], np.array(labels), batch_size = batch_size, nb_epoch = 1, verbose = 0, shuffle = True)
```

## NCF-PT_ejlee
#### train_an_epoch
```python3
def train_an_epoch(self, train_loader, epoch_id):
    assert hasattr(self, 'model'), 'Please specify the exact model !'
    self.model.train()
    total_loss = 0
    for batch_id, batch in enumerate(train_loader):
        assert isinstance(batch[0], torch.LongTensor)
        user, item, rating = batch[0], batch[1], batch[2]
        rating = rating.float()
        loss = self.train_single_batch(user, item, rating)
        print('[Training Epoch {}] Batch {}, Loss {}'.format(epoch_id, batch_id, loss))
        total_loss += loss
    self._writer.add_scalar('model/loss', total_loss, epoch_id)
```
### train_single_batch
```python3
def train_single_batch(self, users, items, ratings):
    assert hasattr(self, 'model'), 'Please specify the exact model !'
    if self.config['use_cuda'] is True:
        users, items, ratings = users.cuda(), items.cuda(), ratings.cuda()
    self.opt.zero_grad()
    ratings_pred = self.model(users, items)
    loss = self.crit(ratings_pred.view(-1), ratings)
    loss.backward()
    self.opt.step()
    loss = loss.item()
    return loss
```

# NGCF
## NGCF-PT_ejlee
### train
```python3
def train(model, data_generator, optimizer):
    model.train()
    n_batch = data_generator.n_train // data_generator.batch_size + 1
    running_loss=0
    for _ in range(n_batch):
        u, i, j = data_generator.sample()
        optimizer.zero_grad()
        loss = model(u,i,j)
        loss.backward()
        optimizer.step()
        running_loss += loss.item()
    return running_loss
```
### data_generator.sample()
```
users = rd.sample(exist_users, batch_size)
pos_items = rd.sample((Movies watched by 'u'), batch_size)
neg_items = rd.sample((Movies unwatched by 'u'), batch_size)
```
Example
- users: `[1,3]`
- pos_items: `[[12,16],[32,37]]`
- neg_items: `[[27,58], [45,59]]`




# NCF vs NGCF

user 1개에 대해서 

NCF (num_negatives=4)
- users : `[1,1,1,1,1]`
- items : `[11,31,22,25,37]`
- ratings : `[2,0,0,0,0]`
```
self.model(users,items)
```
> batch_size = 3일 때, 
> `user 3 item 3 rating(label) 3`
> 
> `user: tensor([1, 2, 2]), item: tensor([21, 11, 22]), rating: tensor([0., 0., 4.])`


NGCF (batch_size = 3)
- users : `[1,2,3]`
- pos_items : `[[11,12,16],[21,24,28],[31,33,32]]`
- neg_items : `[[32,37,41],[11,14,31],[21,24,28]]`
```
model(users, pos_items, neg_items)
```

# Solution

 