# Seminar on Graphs for NLP: Vector representations

## Plan for today:

#### 0. What a taxonomy is. Taxonomy Enrichment task.
#### 1. Node2vec model. Implementation of node2vec.
#### 2. Embedding generation for the OOV words for node2vec. Linear transformation model.
#### 3. Graph Neural networks: GCN and GAT
#### 4. GraphBERT: Only Attention is Needed for Learning Graph Representations

# 0. Taxonomy

A taxonomy is a hierarchical structure of units in terms if class inclusion such that superordinate units in the hierarchy include, or subsume, all items in subordinate units. Taxonomies are typically represented as having tree structures.

![](https://www.digital-mr.com/media/cache/51/6f/516f493d37a7b4895f678843b6383e48.png)


Taxonomies can be represented as graphs!

Let us download the most popular and well-known taxonomy called WordNet. You may also use the `from nltk.corpus import wordnet as wn`, but keep in mind that you can operate with earlier versions.

In [1]:
import os
import torch
os.environ['TORCH'] = torch.__version__
print(torch.__version__)

!pip install -q torch-scatter -f https://data.pyg.org/whl/torch-${TORCH}.html
!pip install -q torch-sparse -f https://data.pyg.org/whl/torch-${TORCH}.html
!pip install -q git+https://github.com/pyg-team/pytorch_geometric.git

1.11.0+cu113
[K     |████████████████████████████████| 7.9 MB 35.6 MB/s 
[K     |████████████████████████████████| 3.5 MB 41.2 MB/s 
[?25h  Building wheel for torch-geometric (setup.py) ... [?25l[?25hdone


In [2]:
!pip install tensorboardX

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting tensorboardX
  Downloading tensorboardX-2.5.1-py2.py3-none-any.whl (125 kB)
[K     |████████████████████████████████| 125 kB 16.3 MB/s 
Installing collected packages: tensorboardX
Successfully installed tensorboardX-2.5.1


In [3]:
!pip install --upgrade gensim

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting gensim
  Downloading gensim-4.2.0-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (24.1 MB)
[K     |████████████████████████████████| 24.1 MB 1.4 MB/s 
Installing collected packages: gensim
  Attempting uninstall: gensim
    Found existing installation: gensim 3.6.0
    Uninstalling gensim-3.6.0:
      Successfully uninstalled gensim-3.6.0
Successfully installed gensim-4.2.0


In [4]:
!gdown --id 1avRebH3BMsolRxmthVFNPoLwyRpAV2tx

Downloading...
From: https://drive.google.com/uc?id=1avRebH3BMsolRxmthVFNPoLwyRpAV2tx
To: /content/wordnet_n_is_directed_1_en_synsets.zip
100% 217M/217M [00:04<00:00, 46.4MB/s]


In [5]:
!unzip wordnet_n_is_directed_1_en_synsets.zip

Archive:  wordnet_n_is_directed_1_en_synsets.zip
   creating: wordnet_n_is_directed_1_en_synsets/
  inflating: wordnet_n_is_directed_1_en_synsets/link  
   creating: wordnet_n_is_directed_1_en_synsets/.ipynb_checkpoints/
  inflating: wordnet_n_is_directed_1_en_synsets/.ipynb_checkpoints/link-checkpoint  
  inflating: wordnet_n_is_directed_1_en_synsets/node  


In [6]:
import nltk
nltk.download('omw-1.4')

[nltk_data] Downloading package omw-1.4 to /root/nltk_data...


True

In [7]:
from gensim.models.poincare import PoincareModel
import numpy as np
import time
import os

In [10]:
from nltk.corpus import wordnet as wn

In [9]:
import nltk
nltk.download('wordnet')

[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


True

In [11]:
wn.synset("guy.n.01").lemmas()

[Lemma('guy.n.01.guy'),
 Lemma('guy.n.01.cat'),
 Lemma('guy.n.01.hombre'),
 Lemma('guy.n.01.bozo')]

In [13]:
path = f"wordnet_n_is_directed_1_en_synsets/"

link_path = os.path.join(path, "link")
node_path = os.path.join(path, "node")

In [14]:
id2synset = {}
fasttext_dict = {}

with open(node_path) as f:
    for line in f:
        line_split = line.split("\t")
        id2synset[line_split[0].strip()] = line_split[-1].strip()
        fasttext_dict[line_split[-1].strip()] = np.array([float(num) for num in line_split[1:-1]])

In [15]:
link_pairs = set()
with open(link_path) as f:
    for line in f:
        line_split = line.split("\t")
        link_pairs.add((id2synset[line_split[0].strip()], id2synset[line_split[-1].strip()]))

# 4. Graph Neural Networks

In [16]:
import torch
import torch.nn as nn
import torch.nn.functional as F

In [17]:
import torch_geometric.nn as pyg_nn
import torch_geometric.utils as pyg_utils

In [18]:
import time
from datetime import datetime

import networkx as nx
import numpy as np
import torch
import torch.optim as optim

from torch_geometric.datasets import TUDataset
from torch_geometric.datasets import Planetoid
from torch_geometric.data import DataLoader
from torch_geometric.utils import train_test_split_edges
import torch_geometric.transforms as T
from torch_geometric.data import Data

from tensorboardX import SummaryWriter
from sklearn.manifold import TSNE
import matplotlib.pyplot as plt


## Data preparation

In [19]:
from gensim.models.keyedvectors import KeyedVectors

fasttext = KeyedVectors(vector_size=300)
fasttext.add_vectors(list(fasttext_dict.keys()), list(fasttext_dict.values()))

In [20]:
import networkx as nx

In [21]:
G = nx.DiGraph()

for pair in link_pairs:
    G.add_edge(*pair)

In [22]:
def create_edge_list(G):
    starts = []
    ends = []
    for left, right in G.edges:
        if left in fasttext.key_to_index and right in fasttext.key_to_index:
            starts.append(fasttext.key_to_index[left])
            ends.append(fasttext.key_to_index[right])
    return torch.tensor([starts, ends], dtype=torch.long)

In [23]:
index_to_key = dict(map(reversed, fasttext.key_to_index.items()))

In [24]:
edge_index = create_edge_list(G)

In [25]:
x = torch.tensor([fasttext[index_to_key[int(i)]] for i in index_to_key], dtype=torch.float)

  """Entry point for launching an IPython kernel.


In [29]:
x.shape

torch.Size([78748, 300])

In [26]:
data = Data(x=x, edge_index=edge_index)
#data = train_test_split_edges(data)

In [27]:
from torch_geometric.transforms import RandomLinkSplit

In [28]:
transform = RandomLinkSplit(is_undirected=True, split_labels=True)
train_data, val_data, test_data = transform(data)

### GCN and GAT Encoder

The following code snippet describes the Encoder module with GCN or GAT networks.

In [30]:
class Encoder(torch.nn.Module):
    def __init__(self, in_channels, out_channels, mode="gcn"):
        super(Encoder, self).__init__()
        if mode == "gcn":
            self.conv1 = pyg_nn.GCNConv(in_channels, 2 * out_channels, cached=True)
            self.conv2 = pyg_nn.GCNConv(2 * out_channels, out_channels, cached=True)
        elif mode == 'gat':
            self.conv1 = pyg_nn.GATConv(in_channels, 2 * out_channels)
            self.conv2 = pyg_nn.GATConv(2 * out_channels, out_channels)
        else:
            raise Exception("Encoder mode is not recognized, try gcn/gat")

    def forward(self, x, edge_index):
        x = F.relu(self.conv1(x, edge_index))
        return self.conv2(x, edge_index)

def train(epoch):
    model.train()
    optimizer.zero_grad()
    z = model.encode(x, train_pos_edge_index)
    loss = model.recon_loss(z, train_pos_edge_index)
    loss.backward()
    optimizer.step()
    writer.add_scalar("loss", loss.item(), epoch)
    return loss.item()

def test(pos_edge_index, neg_edge_index):
    model.eval()
    with torch.no_grad():
        z = model.encode(x, train_pos_edge_index)
    return model.test(z, pos_edge_index, neg_edge_index)

In [31]:
writer = SummaryWriter("./log/" + datetime.now().strftime("%Y%m%d-%H%M%S"))

channels = 64
dev = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print('CUDA availability:', torch.cuda.is_available())

CUDA availability: True


## Variational Graph Auto-Encoders

https://arxiv.org/pdf/1611.07308.pdf

The pipeline is working as follows: first, we train a graph autoencoder with GCN or GAT under the hoot. During the evaluation phase, the latent representations of the autoencoder are actually the embeddings we are looking for.

In [32]:
model = pyg_nn.GAE(Encoder(300, channels, 'gcn')).to(dev)
x, train_pos_edge_index = train_data.x.to(dev), train_data.pos_edge_label_index.to(dev)
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)

for epoch in range(1, 401):
    loss = train(epoch)
    auc, ap = test(test_data.pos_edge_label_index, test_data.neg_edge_label_index)
    writer.add_scalar("AUC", auc, epoch)
    writer.add_scalar("AP", ap, epoch)
    if epoch % 10 == 0:
        print('Epoch: {:03d}, AUC: {:.4f}, AP: {:.4f}, Loss: {:.4f}'.format(epoch, auc, ap, loss))

Epoch: 010, AUC: 0.8235, AP: 0.8087, Loss: 0.9674
Epoch: 020, AUC: 0.8606, AP: 0.8509, Loss: 0.8795
Epoch: 030, AUC: 0.8839, AP: 0.8770, Loss: 0.8350
Epoch: 040, AUC: 0.8899, AP: 0.8866, Loss: 0.8199
Epoch: 050, AUC: 0.8975, AP: 0.8956, Loss: 0.8090
Epoch: 060, AUC: 0.9017, AP: 0.9012, Loss: 0.7974
Epoch: 070, AUC: 0.9057, AP: 0.9064, Loss: 0.7931
Epoch: 080, AUC: 0.9069, AP: 0.9091, Loss: 0.7868
Epoch: 090, AUC: 0.9095, AP: 0.9122, Loss: 0.7862
Epoch: 100, AUC: 0.9112, AP: 0.9142, Loss: 0.7817
Epoch: 110, AUC: 0.9111, AP: 0.9152, Loss: 0.7829
Epoch: 120, AUC: 0.9125, AP: 0.9167, Loss: 0.7790
Epoch: 130, AUC: 0.9124, AP: 0.9172, Loss: 0.7727
Epoch: 140, AUC: 0.9133, AP: 0.9184, Loss: 0.7729
Epoch: 150, AUC: 0.9136, AP: 0.9189, Loss: 0.7673
Epoch: 160, AUC: 0.9135, AP: 0.9192, Loss: 0.7675
Epoch: 170, AUC: 0.9153, AP: 0.9210, Loss: 0.7674
Epoch: 180, AUC: 0.9150, AP: 0.9213, Loss: 0.7641
Epoch: 190, AUC: 0.9148, AP: 0.9213, Loss: 0.7646
Epoch: 200, AUC: 0.9142, AP: 0.9209, Loss: 0.7619


#### Examples

Let us see the nearest neighbours for the unseen words from the test set.

In [33]:
model.eval()
new_x = torch.tensor([fasttext[index_to_key[i]] for i in index_to_key], dtype=torch.float).to(dev)
z = model.encode(new_x, train_pos_edge_index)

In [34]:
id2syns = {}
syns2id = {}
with open('wordnet_n_is_directed_1_en_synsets/node') as f:
    for line in f:
        id2syns[line.split()[0]] = line.split()[-1]
        syns2id[line.split()[-1]] = line.split()[0]

In [35]:
par2orph = {}
orph2par = {}
with open('wordnet_n_is_directed_1_en_synsets/link') as f:
    for line in f:
        par_id = line.split()[0]
        child_id = line.split()[-1]
        
        if "ORPHAN_" in id2syns[child_id]:
            par2orph[id2syns[par_id]] = id2syns[child_id]
            orph2par[id2syns[child_id]] = id2syns[par_id]

In [37]:
c = 0
for word in fasttext.key_to_index:
    if ".n." not in word:
        cur_index = fasttext.key_to_index[word]
        tensor_ = torch.tensor([[cur_index]*(len(G.nodes)), [i for i in range(0, len(G.nodes))]])
        results = model.decode(z, tensor_)
        top10 = list(reversed(sorted([(index_to_key[i], round(float(score.cpu().detach().float()), 4)) for i, score in enumerate(results)], key=lambda x: x[1])))[:10]       
        print(orph2par[word], ":", top10)
        print("="*10)
        c += 1
        if c == 20:
            break

course.n.04 : [('way.n.05', 0.9996), ('ORPHAN_100000000', 0.9995), ('golf_course.n.01', 0.9993), ('appetizer.n.01', 0.9986), ('course.n.04', 0.9983), ('dessert.n.01', 0.9981), ('course.n.07', 0.9976), ('course.n.09', 0.9976), ('scheme.n.01', 0.9968), ('system.n.07', 0.9961)]
recovery.n.03 : [('pump_action.n.01', 0.9676), ('movement.n.10', 0.9666), ('transmission.n.01', 0.9653), ('recovery.n.03', 0.9651), ('reclamation.n.02', 0.9643), ('conservation.n.02', 0.9641), ('action.n.01', 0.9632), ('ORPHAN_100000062', 0.9618), ('ORPHAN_100000001', 0.9603), ('quitclaim.n.01', 0.9586)]
disappearance.n.01 : [('due.n.02', 0.9858), ('ORPHAN_100000002', 0.9849), ('vanishing.n.01', 0.9683), ('cheap_shot.n.02', 0.9667), ('flowage.n.01', 0.96), ('outflow.n.02', 0.9593), ('flood.n.01', 0.9541), ('relict.n.01', 0.9534), ('erasure.n.02', 0.9525), ('inpouring.n.01', 0.9512)]
hit.n.03 : [('hit.n.07', 0.9921), ('base_hit.n.01', 0.9712), ('hit.n.01', 0.9557), ('collision.n.01', 0.9536), ('cog.n.01', 0.9503), (

## GraphBERT

https://github.com/jwzhanggy/Graph-Bert

Yet another model for embedding generation is GraphBert. Instead of feeding large input graph, we train GRAPH-BERT with sampled subgraphs within their local contexts. The input vector embeddings to be fed to the graphtransformer model actually cover four parts: (1) raw feature vector embedding, (2) Weisfeiler-Lehman absolute role embedding, (3) intimacy based relative positional embedding, and (4) hop based relative distance embedding, respectively.

GRAPH-BERT is trained with the node attribute reconstruction and structure recovery tasks.

![](https://github.com/jwzhanggy/Graph-Bert/raw/master/result/screenshot/model.png)

## Subgraph Sampling

![](https://i.ibb.co/5cbjJZ6/photo-2021-12-07-16-41-32.jpg)

## Positional embeddings

### Weisfeiler-Lehman Absolute Role Embedding

![](https://i.ibb.co/bgT7gqb/wl.png)

### Intimacy based Relative Positional Embedding

![](https://i.ibb.co/34FvCf0/photo-2021-12-07-16-52-30.jpg)

### Hop based Relative Distance Embedding
![](https://i.ibb.co/tCzRcfK/hops-drawio.png)

Actually, you are simply expected to run two scripts: `script_1_preprocess.py` and `script_2_pre_train.py`

In [None]:
!git clone https://github.com/jwzhanggy/Graph-Bert.git

Cloning into 'Graph-Bert'...
remote: Enumerating objects: 442, done.[K
remote: Counting objects: 100% (128/128), done.[K
remote: Compressing objects: 100% (50/50), done.[K
remote: Total 442 (delta 102), reused 78 (delta 78), pack-reused 314[K
Receiving objects: 100% (442/442), 2.23 MiB | 21.11 MiB/s, done.
Resolving deltas: 100% (228/228), done.


In [None]:
%cd Graph-Bert
!python3 script_1_preprocess.py

[Errno 2] No such file or directory: 'Graph-Bert'
/home/nikishina/Graph-Bert
************ Start ************
WL, dataset: wordnet_n_is_directed_1_en_synsets_2.0
Loading wordnet_n_is_directed_1_en_synsets_2.0 dataset...
************ Finish ************
************ Start ************
Subgraph Batching, dataset: wordnet_n_is_directed_1_en_synsets_2.0, k: 5
Loading wordnet_n_is_directed_1_en_synsets_2.0 dataset...
  r_inv = np.power(rowsum, -0.5).flatten()
************ Finish ************
************ Start ************
HopDistance, dataset: wordnet_n_is_directed_1_en_synsets_2.0, k: 5
Loading wordnet_n_is_directed_1_en_synsets_2.0 dataset...
************ Finish ************


In [None]:
!python3 script_2_pre_train.py

************ Start ************
GrapBert, dataset: wordnet_n_is_directed_1_en_synsets_2.0, Pre-training, Node Attribute Reconstruction.
Loading wordnet_n_is_directed_1_en_synsets_2.0 dataset...
Load WL Dictionary
Load Hop Distance Dictionary
Load Subgraph Batches
Epoch: 0001 loss_train: 0.0067 time: 0.3729s
Epoch: 0051 loss_train: 0.0027 time: 0.1480s
Epoch: 0101 loss_train: 0.0026 time: 0.3395s
Epoch: 0151 loss_train: 0.0025 time: 0.3424s
Optimization Finished!
Total time elapsed: 67.0946s
Save pretrained model in ./result/PreTrained_GraphBert/wordnet_n_is_directed_1_en_synsets_2.0/node_reconstruct_model/
************ Finish ************
************ Start ************
GrapBert, dataset: wordnet_n_is_directed_1_en_synsets_2.0, Pre-training, Graph Structure Recovery.
Load pretrained model from ./result/PreTrained_GraphBert/wordnet_n_is_directed_1_en_synsets_2.0/node_reconstruct_model/
Loading wordnet_n_is_directed_1_en_synsets_2.0 dataset...
Load WL Dictionary
Load Hop Distance Diction

After the model has been trained, we predict embeddings for the new (unseen words) and their nearest neighbours.

In [None]:
import os
import sys

import numpy as np
from nltk.corpus import wordnet as wn

sys.path.append("/home/nikishina/Graph-Bert/code")
sys.path.append("/home/nikishina/Graph-Bert/")
from DatasetLoader import DatasetLoader
from MethodBertComp import GraphBertConfig
from MethodGraphBertGraphRecovery import MethodGraphBertGraphRecovery
from MethodGraphBertNodeConstruct import MethodGraphBertNodeConstruct
from itertools import combinations
os.environ["CUDA_VISIBLE_DEVICES"] = "1"


def load_data(dataset_path, k, device):
    data_obj = DatasetLoader()
    data_obj.dataset_source_folder_path = '/home/nikishina/Graph-Bert/data/' + dataset_path + '/'
    data_obj.dataset_name = dataset_path
    data_obj.k = k
    data_obj.device = device
    data_obj.load_all_tag = True
    return data_obj.load()


def get_query_embedding(word, final_embeddings, index_id_map):
    offset, definition = wn.synset(word).offset(), wn.synset(word).definition()
    index_of_synset = None

    for i, j in index_id_map.items():
        if j == offset:
            index_of_synset = i
            break

    query_embedding = final_embeddings[index_of_synset]
    return query_embedding

In [None]:
class GraphBERTEmbeddingsSaver:
    def __init__(self, model_name, model, x_size=300, device='cpu', max_index=132, intermediate_size=32,
                 num_attention_heads=2, num_hidden_layers=2, y_size=0, residual_type='graph_raw', k=5, nfeature=300):

        pretrained_path = './result/PreTrained_GraphBert/' + model_name
        bert_config = GraphBertConfig(residual_type=residual_type, k=k, x_size=x_size, y_size=y_size,
                                      hidden_size=intermediate_size, intermediate_size=intermediate_size,
                                      num_attention_heads=num_attention_heads, num_hidden_layers=num_hidden_layers,
                                      max_wl_role_index=max_index, max_hop_dis_index=max_index,
                                      max_inti_pos_index=max_index)

        self.model = model(bert_config, pretrained_path, device=device)
        self.model.eval()
        self.nfeature = nfeature

    def compute_and_save_embeddings(self, data, test_synsets, index_id_map, id2label, result_dir):
        final_embeddings = self.compute_embeddings(data, index_id_map, id2label)
        self.save_embeddings(test_synsets, final_embeddings, result_dir)

    def compute_embeddings(self, data, index_id_map, id2label):
        final_embeddings = np.zeros(shape=(len(index_id_map), self.nfeature), dtype=np.float32)

        for _index, raw_f, wl, init, hop in zip(index_id_map, *data):
            final_embeddings[_index, :] = np.array(
                self.model(raw_f.unsqueeze(0), wl.unsqueeze(0), init.unsqueeze(0), hop.unsqueeze(0))[0]
                    .cpu().detach())
        return self.get_embeddings_dict(final_embeddings, index_id_map, id2label)

    @staticmethod
    def get_embeddings_dict(embeddings, index2id_map, id2label):
        return {id2label[index]: embeddings[_id] for _id, index in index2id_map.items()}

    def save_embeddings(self, test_synsets, embeddings, result_dir):
        with open(os.path.join(result_dir, f"{self.model.__class__.__name__}_model_train_embeddings.txt"), 'w') as w1:
            with open(os.path.join(result_dir, f"{self.model.__class__.__name__}_model_test_embeddings.txt"),
                      'w') as w2:
                for synset_name, embedding in embeddings.items():
                    if synset_name in test_synsets:
                        text_embedding = " ".join([str(e) for e in embedding])
                        w2.write(f"{synset_name} {text_embedding}\n")
                    else:
                        text_embedding = " ".join([str(e) for e in embedding])
                        w1.write(f"{synset_name} {text_embedding}\n")

In [None]:
loaded_data = load_data('wordnet_n_is_directed_1_en_synsets_2.0', 5, 'cpu')
dataset = (loaded_data['raw_embeddings'], loaded_data['wl_embedding'], loaded_data['hop_embeddings'],
           loaded_data['int_embeddings'])

Loading wordnet_n_is_directed_1_en_synsets_2.0 dataset...
Load WL Dictionary
Load Hop Distance Dictionary
Load Subgraph Batches


In [None]:
index_id_map = loaded_data['index_id_map']

In [None]:
idx_features_labels = np.genfromtxt("{}/node".format('/home/nikishina/Graph-Bert/data/wordnet_n_is_directed_1_en_synsets_2.0/'), dtype=np.dtype(str))
id2label = {int(i): j for i, j in zip(idx_features_labels[:, 0], idx_features_labels[:, -1])}

In [None]:
saver = GraphBERTEmbeddingsSaver('wordnet_n_is_directed_1_en_synsets_2.0/node_reconstruct_model', MethodGraphBertNodeConstruct)
saver.compute_and_save_embeddings(dataset, new_words, index_id_map, id2label, "../")

Load pretrained model from ./result/PreTrained_GraphBert/wordnet_n_is_directed_1_en_synsets_2.0/node_reconstruct_model


In [None]:
saver = GraphBERTEmbeddingsSaver('wordnet_n_is_directed_1_en_synsets_2.0/node_graph_reconstruct_model', MethodGraphBertGraphRecovery)
saver.compute_and_save_embeddings(dataset, new_words, index_id_map, id2label, "../")

Load pretrained model from ./result/PreTrained_GraphBert/wordnet_n_is_directed_1_en_synsets_2.0/node_graph_reconstruct_model


## View and evaluate results

In [38]:
!gdown 1IAfd9tRgtVtdosM5vuDdxh-VSBFp3mzI
!gdown 1LItbxEcchOfU4TrlLBZjQweC8jpQ3b3Q
!gdown 1VLLLyu9YyLX3uCojiTm_VLtgK2gKCCfW
!gdown 1h5sSbFeCJbouH96fKIZDF2xugNiKf3La

Downloading...
From: https://drive.google.com/uc?id=1IAfd9tRgtVtdosM5vuDdxh-VSBFp3mzI
To: /content/MethodGraphBertGraphRecovery_model_test_embeddings.txt
100% 258k/258k [00:00<00:00, 96.0MB/s]
Downloading...
From: https://drive.google.com/uc?id=1LItbxEcchOfU4TrlLBZjQweC8jpQ3b3Q
To: /content/MethodGraphBertGraphRecovery_model_train_embeddings_.txt
100% 196M/196M [00:02<00:00, 80.6MB/s]
Downloading...
From: https://drive.google.com/uc?id=1VLLLyu9YyLX3uCojiTm_VLtgK2gKCCfW
To: /content/MethodGraphBertNodeConstruct_model_train_embeddings_.txt
100% 288M/288M [00:04<00:00, 68.5MB/s]
Downloading...
From: https://drive.google.com/uc?id=1h5sSbFeCJbouH96fKIZDF2xugNiKf3La
To: /content/MethodGraphBertNodeConstruct_model_test_embeddings.txt
100% 362k/362k [00:00<00:00, 129MB/s]


In [39]:
from gensim.models import KeyedVectors

In [40]:
graphBertNode_train = KeyedVectors.load_word2vec_format("MethodGraphBertNodeConstruct_model_train_embeddings_.txt")
graphBertNode_test = KeyedVectors.load_word2vec_format("MethodGraphBertNodeConstruct_model_test_embeddings.txt")

In [51]:
graphBertNode_train.similar_by_word("dog.n.01")

[('hound.n.01', 0.8980603814125061),
 ('working_dog.n.01', 0.8848254680633545),
 ('dandy.n.01', 0.8751208782196045),
 ('old_man.n.01', 0.8639864325523376),
 ('professional.n.01', 0.8605043888092041),
 ('gravida.n.02', 0.849956750869751),
 ('child.n.02', 0.8499069809913635),
 ('spaniel.n.01', 0.8490362167358398),
 ('subordinate.n.01', 0.8471304178237915),
 ('parent.n.01', 0.8452426195144653)]

In [48]:
wn.synset("depression.n.01").hypernyms()

[Synset('psychological_state.n.01')]

In [50]:
graphbert_node_predicts = {}

for word in fasttext.key_to_index:
    if ".n." not in word:
        graphbert_node_predicts[orph2par[word]] = graphBertNode_train.similar_by_vector(graphBertNode_test[orph2par[word]])

KeyError: ignored