<a href="https://colab.research.google.com/github/tamara-kostova/IIS/blob/master/lab4.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [22]:
!pip install torch
!pip install torch_geometric
!pip install torch_scatter torch_sparse torch_cluster torch_spline_conv -f https://data.pyg.org/whl/torch-2.2.0+cpu.html

Looking in links: https://data.pyg.org/whl/torch-2.2.0+cpu.html


In [23]:
import torch
from torch_geometric.nn import TransE, ComplEx


**Function for training**

In [36]:
def train(model, data_loader, optimizer, epochs=10):
    for epoch in range(epochs):
        model.train()
        total_loss = 0
        total_examples = 0

        for head_index, rel_type, tail_index in data_loader:
            optimizer.zero_grad()
            loss = model.loss(head_index, rel_type, tail_index)
            loss.backward()
            optimizer.step()
            total_loss += float(loss) * head_index.numel()
            total_examples += head_index.numel()

        loss = total_loss / total_examples
        print(f'Epoch: {epoch:03d}, Loss: {loss:.4f}')

**Function for evaluating**

In [37]:
def evaluate(model, data_loader):
    hits1_list = []
    hits3_list = []
    hits10_list = []
    mr_list = []
    mrr_list = []

    for head_index, rel_type, tail_index in data_loader:
        head_embeds = model.node_emb(head_index)
        relation_embeds = model.rel_emb(rel_type)
        tail_embeds = model.node_emb(tail_index)

        if isinstance(model, TransE):
            scores = torch.norm(head_embeds + relation_embeds - tail_embeds, p=1, dim=1)

        elif isinstance(model, ComplEx):
            # Get real and imaginary parts
            re_relation, im_relation = torch.chunk(relation_embeds, 2, dim=1)
            re_head, im_head = torch.chunk(head_embeds, 2, dim=1)
            re_tail, im_tail = torch.chunk(tail_embeds, 2, dim=1)

            # Compute scores
            re_score = re_head * re_relation - im_head * im_relation
            im_score = re_head * im_relation + im_head * re_relation
            scores = (re_score * re_tail + im_score * im_tail)

            # Negate as we want to rank scores in ascending order, lower the better
            scores = - scores.sum(dim=1)

        else:
            raise ValueError(f'Unsupported model.')

        scores = scores.view(-1, head_embeds.size()[0])

        hits1, hits3, hits10, mr, mrr = eval_metrics(scores)
        hits1_list.append(hits1.item())
        hits3_list.append(hits3.item())
        hits10_list.append(hits10.item())
        mr_list.append(mr.item())
        mrr_list.append(mrr.item())

    hits1 = sum(hits1_list) / len(hits1_list)
    hits3 = sum(hits3_list) / len(hits1_list)
    hits10 = sum(hits10_list) / len(hits1_list)
    mr = sum(mr_list) / len(hits1_list)
    mrr = sum(mrr_list) / len(hits1_list)

    return hits1, hits3, hits10, mr, mrr

**Evaluation metrics**

In [38]:
def eval_metrics(y_pred):
    argsort = torch.argsort(y_pred, dim=1, descending=False)
    # not using argsort to do the rankings to avoid bias when the scores are equal
    ranking_list = torch.nonzero(argsort == 0, as_tuple=False)
    ranking_list = ranking_list[:, 1] + 1
    hits1_list = (ranking_list <= 1).to(torch.float)
    hits3_list = (ranking_list <= 3).to(torch.float)
    hits10_list = (ranking_list <= 10).to(torch.float)
    mr_list = ranking_list.to(torch.float)
    mrr_list = 1. / ranking_list.to(torch.float)

    return hits1_list.mean(), hits3_list.mean(), hits10_list.mean(), mr_list.mean(), mrr_list.mean()

**Load data**

In [39]:
from torch_geometric.datasets import FB15k_237

train_data = FB15k_237('../data/FB15k', split='train')[0]
val_data = FB15k_237('../data/FB15k', split='val')[0]
test_data = FB15k_237('../data/FB15k', split='test')[0]


# **EXERCISE 1**

TransE Knowledge Graph

In [40]:
from torch.optim import Adam

model = TransE(num_nodes=train_data.num_nodes,
                   num_relations=train_data.num_edge_types,
                   hidden_channels=50)

loader = model.loader(head_index=train_data.edge_index[0],
                          rel_type=train_data.edge_type,
                          tail_index=train_data.edge_index[1],
                          batch_size=1000,
                          shuffle=True)

optimizer = Adam(model.parameters(), lr=0.01)

**Train model**

In [41]:
train(model, loader, optimizer)

Epoch: 000, Loss: 0.7605
Epoch: 001, Loss: 0.5579
Epoch: 002, Loss: 0.4360
Epoch: 003, Loss: 0.3519
Epoch: 004, Loss: 0.2976
Epoch: 005, Loss: 0.2639
Epoch: 006, Loss: 0.2420
Epoch: 007, Loss: 0.2263
Epoch: 008, Loss: 0.2143
Epoch: 009, Loss: 0.2032


**Results**

In [42]:
hits1, hits3, hits10, mr, mrr = evaluate(model,loader)

In [43]:
print(f'Mean Rank: {mr:.2f}, Mean Reciprocal Rank: {mrr:.4f}, '
      f'Hits@1: {hits1:.4f}, Hits@3: {hits3:.4f}, Hits@10: {hits10:.4f}')

Mean Rank: 528.74, Mean Reciprocal Rank: 0.0082, Hits@1: 0.0000, Hits@3: 0.0073, Hits@10: 0.0110


# **EXERCISE 2**

In [44]:
model2 = ComplEx(num_nodes=train_data.num_nodes,
                   num_relations=train_data.num_edge_types,
                   hidden_channels=50)

loader2 = model2.loader(head_index=train_data.edge_index[0],
                          rel_type=train_data.edge_type,
                          tail_index=train_data.edge_index[1],
                          batch_size=1000,
                          shuffle=True)

optimizer2 = Adam(model.parameters(), lr=0.01)

**Train model**

In [45]:
train(model2, loader2, optimizer2)

Epoch: 000, Loss: 0.6931
Epoch: 001, Loss: 0.6931
Epoch: 002, Loss: 0.6931
Epoch: 003, Loss: 0.6931
Epoch: 004, Loss: 0.6931
Epoch: 005, Loss: 0.6931
Epoch: 006, Loss: 0.6931
Epoch: 007, Loss: 0.6931
Epoch: 008, Loss: 0.6931
Epoch: 009, Loss: 0.6931


**Results**

In [46]:
hits1, hits3, hits10, mr, mrr = evaluate(model2,loader2)

In [47]:
print(f'Mean Rank: {mr:.2f}, Mean Reciprocal Rank: {mrr:.4f}, '
      f'Hits@1: {hits1:.4f}, Hits@3: {hits3:.4f}, Hits@10: {hits10:.4f}')

Mean Rank: 498.27, Mean Reciprocal Rank: 0.0093, Hits@1: 0.0000, Hits@3: 0.0037, Hits@10: 0.0183


## **Conclusion**

The **ComplEx** model has better MR and MRR scores which meand it outperforms it on average (*The Mean Rank is the average rank of the first correct answer - so **lower** values are better, whereas Mean Reciprocial Rank is the average of the reciprocal ranks of the first correct answer for each query which means **higher** numbers are better*).


---


The **TransE** model has a higher Hits@3 than the **ComplEx** model, indicating that it is better at recommending relevant items within the top 3 predictions, whereas the **ComplEx** model is better when it comes to the top 10 predictions.