## Import Library

In [1]:
import numpy as np
import torch
import torch.nn as nn
from torch.nn import functional as F
import json
from gensim.test.utils import datapath
from gensim.models import KeyedVectors
from gensim.scripts.glove2word2vec import glove2word2vec
from scipy.spatial.distance import cosine
from scipy.stats import spearmanr

### 1. Compare Skip-gram, Skip-gram negative sampling, GloVe models on training loss, training time.



The models underwent training using a handpicked subset of the Reuters corpus obtained from NLTK. This subset comprises 1000 passages out of the total 54716 available, encompassing 4152 tokens out of a grand total of 1728932.





#### Training Loss
| Model                          | Average Training Loss |
|--------------------------------|-----------------------|
| Skip-gram                      | 8.051740          |
| Skip-gram with Negative Sampling | 1.905731             |
| GloVe                        | 1.391059 |



#### Training Time

| Model                          | Total Training Time |
|--------------------------------|---------------------|
| Skip-gram                      | 30m 36s            |
| Skip-gram with Negative Sampling | 29m 22s            |
| GloVe Scratch                    | 3m 12s              |


The training data and outcomes indicate the GloVe model achieved the lowest average training loss and shortest training time, showcasing its efficiency and effectiveness. Conversely, the Skip-gram model, despite its longer training duration, had the highest loss, suggesting it might be less efficient in this context. The Skip-gram with Negative Sampling offered a balance, with significantly reduced loss compared to Skip-gram and marginally faster training, highlighting its improved efficiency over the standard Skip-gram model. These points underscore the differences in model performance and training efficiency within the same dataset.

### 2. Use Word analogies dataset to calucalte between syntactic and semantic accuracy

In [63]:
def extract_analogy_tasks(file_path):
    """
    Extracts analogy tasks from a specified file, focusing on specific categories.
    Categories processed include 'capital-common-countries' and 'gram7-past-tense'.
    Stops processing upon reaching the 'gram8-plural' category.

    Parameters:
    - file_path: The path to the file containing analogy tasks.

    Returns:
    - A tuple of two lists: one for 'capital-common-countries' tasks and another for 'gram7-past-tense' tasks.
    """
    with open(file_path, 'r') as file:
        lines = file.readlines()

    # Initialize lists for specific analogy types
    capital_common_countries = []
    gram7_past_tense = []

    # Variable to track the current category of analogies being processed
    current_section = None

    for line in lines[1:]:  # Skip the first line assuming it's a header or format descriptor
        # Check and update the current section based on the line content
        if ': capital-common-countries' in line:
            current_section = capital_common_countries
            continue
        elif ': gram7-past-tense' in line:
            current_section = gram7_past_tense
            continue
        elif ': gram8-plural' in line:
            break  # Exit the loop upon reaching this category

        # If the line belongs to a current section, process and add it to the appropriate list
        if current_section is not None:
            words = line.strip().split()
            if len(words) == 4:  # Ensure the line has exactly four words, as expected for analogy tasks
                current_section.append(tuple(words))

    return capital_common_countries, gram7_past_tense

# Load the data
file_path = 'word-test.v1.txt'
capital_common_countries, past_tense = extract_analogy_tasks(file_path)


In [6]:
capital_common_countries[0], capital_common_countries[len(capital_common_countries)-1]

(('Athens', 'Greece', 'Baghdad', 'Iraq'),
 ('Ukraine', 'Ukrainian', 'Switzerland', 'Swiss'))

In [7]:
past_tense[0], past_tense[len(past_tense)-1]

(('dancing', 'danced', 'decreasing', 'decreased'),
 ('writing', 'wrote', 'walking', 'walked'))

In [30]:

def calculate_accuracy_skipgram(model, dataset, word2index, index2word):
    correct = 0
    total = 0

    for word1, word1_target, word2, word2_target in dataset:
        # Skip analogy if any word is not in the model's vocabulary to avoid errors
        if all(word in word2index for word in [word1, word1_target, word2, word2_target]):
            total += 1

            # Convert words to indices for model processing
            word1_idx = word2index[word1]
            word1_target_idx = word2index[word1_target]
            word2_idx = word2index[word2]

            # Obtain embeddings by averaging center and context (outside) embeddings
            word1_emb = (model.embedding_center(torch.tensor([word1_idx], dtype=torch.long)).squeeze(0) +
                         model.embedding_outside(torch.tensor([word1_idx], dtype=torch.long)).squeeze(0)) / 2
            word1_target_emb = (model.embedding_center(torch.tensor([word1_target_idx], dtype=torch.long)).squeeze(0) +
                                model.embedding_outside(torch.tensor([word1_target_idx], dtype=torch.long)).squeeze(0)) / 2
            word2_emb = (model.embedding_center(torch.tensor([word2_idx], dtype=torch.long)).squeeze(0) +
                         model.embedding_outside(torch.tensor([word2_idx], dtype=torch.long)).squeeze(0)) / 2

            # Vector arithmetic to predict the target word's embedding
            expected_emb = word2_emb - word1_emb + word1_target_emb

            # Compute cosine similarity between expected embedding and all others
            similarities = F.cosine_similarity((model.embedding_center.weight + model.embedding_outside.weight) / 2, 
                                               expected_emb.unsqueeze(0), dim=1)


            # Before the loop calculating similarities, define indices_to_exclude
            indices_to_exclude = [word2index[word] for word in [word1, word1_target, word2] if word in word2index]
            for idx in indices_to_exclude:
                similarities[idx] = -1

            # Identify the most similar embedding as the predicted word
            max_similarity_idx = torch.argmax(similarities).item()

            # If the predicted word matches the target, count it as correct
            if index2word[str(max_similarity_idx)] == word2_target:
                correct += 1

    # Calculate and return the overall accuracy
    accuracy = correct / total if total > 0 else 0
    return accuracy


In [32]:
def calculate_accuracy_GloVe(model, dataset, word2index, index2word):
    correct = 0
    total = 0

    for word1, word1_target, word2, word2_target in dataset:
        # Ensure all words are in the model's vocabulary to avoid processing OOV words
        if all(word in word2index for word in [word1, word1_target, word2, word2_target]):
            total += 1

            # Retrieve indices for each word to access embeddings
            word1_idx, word1_target_idx, word2_idx = [word2index[word] for word in [word1, word1_target, word2]]

            # Average center and outside embeddings to get a comprehensive word representation
            word1_emb, word1_target_emb, word2_emb = [
                (model.center_embedding(torch.tensor([idx], dtype=torch.long)).squeeze(0) +
                 model.outside_embedding(torch.tensor([idx], dtype=torch.long)).squeeze(0)) / 2
                for idx in [word1_idx, word1_target_idx, word2_idx]
            ]

            # Vector arithmetic to predict the embedding of the target word
            expected_emb = word2_emb - word1_emb + word1_target_emb

            # Compute cosine similarities between the predicted embedding and all vocabulary embeddings
            similarities = F.cosine_similarity((model.center_embedding.weight + model.outside_embedding.weight) / 2,
                                               expected_emb.unsqueeze(0), dim=1)

            # Exclude the analogy words from the similarity search to ensure fairness
            indices_to_exclude = [word2index[word] for word in [word1, word1_target, word2] if word in word2index]
            for idx in indices_to_exclude:
                similarities[idx] = -1  # Set excluded indices to a low similarity

            # Identify the word most similar to the calculated embedding
            max_similarity_idx = torch.argmax(similarities).item()

            # Check if the most similar word matches the target word
            if index2word[str(max_similarity_idx)] == word2_target:
                correct += 1  # Increment correct count if prediction is accurate

    # Calculate and return the model's accuracy on the dataset
    accuracy = correct / total if total > 0 else 0
    return accuracy


In [48]:
def calculate_accuracy_GloVe_gensim(model, dataset):
    correct = 0
    total = 0

    for word1, word1_target, word2, word2_target in dataset:
        # Check if all words are in the model's vocabulary, skip the analogy if any word is OOV
        if all(word in model.key_to_index for word in [word1, word1_target, word2, word2_target]):
            total += 1

            # Get the embeddings for each word
            word1_emb = model[word1]
            word1_target_emb = model[word1_target]
            word2_emb = model[word2]

            # Compute the expected embedding for the target word
            expected_emb = word2_emb - word1_emb + word1_target_emb

            # Calculate similarities between the expected embedding and all word embeddings in the vocabulary
            all_embeddings = model.vectors
            similarities = np.dot(all_embeddings, expected_emb) / (np.linalg.norm(all_embeddings, axis=1) * np.linalg.norm(expected_emb))

            # Exclude original words from consideration
            for word in [word1, word1_target, word2]:
                if word in model.key_to_index:
                    similarities[model.key_to_index[word]] = -1

            max_similarity_idx = np.argmax(similarities)

            # Check if the word with the maximum similarity is the target word
            if model.index_to_key[max_similarity_idx] == word2_target:
                correct += 1

    accuracy = correct / total if total > 0 else 0
    return accuracy

In [22]:
# Define the Skipgram model class
class Skipgram(nn.Module):
    def __init__(self, voc_size, emb_size):
        super(Skipgram, self).__init__()
        # Embedding layers for center and outside words
        self.embedding_center = nn.Embedding(voc_size, emb_size)
        self.embedding_outside = nn.Embedding(voc_size, emb_size)
    
    def forward(self, center, outside, all_vocabs):
        # Obtain embeddings for center, outside, and all vocabulary words
        center_embedding = self.embedding_center(center)
        outside_embedding = self.embedding_outside(outside)
        all_vocabs_embedding = self.embedding_outside(all_vocabs)
        
        # Calculate top and lower terms for loss computation
        top_term = torch.exp(outside_embedding.bmm(center_embedding.transpose(1, 2)).squeeze(2))
        lower_term = all_vocabs_embedding.bmm(center_embedding.transpose(1, 2)).squeeze(2)
        lower_term_sum = torch.sum(torch.exp(lower_term), 1)
        
        # Calculate and return loss
        loss = -torch.mean(torch.log(top_term / lower_term_sum))
        return loss

word2index_path = './model/word2index_skipgram.json'  
index2word_path = './model/index2word_skipgram.json' 
model_path = './model/word2vec_model_skipgram.pth'
config_path = './model/word2vec_config_skipgram.json'        
with open(config_path, 'r') as config_file:
    config_skipgram = json.load(config_file)

# Initialize model with loaded configuration
loaded_model_Skipgram = Skipgram(voc_size=config_skipgram['voc_size'], emb_size=config_skipgram['emb_size'])

# Load model state
loaded_model_Skipgram.load_state_dict(torch.load(model_path, map_location=torch.device('cpu')))
loaded_model_Skipgram.eval()  

with open(word2index_path, 'r') as file:
    word2index_skipgram = json.load(file)

with open(index2word_path, 'r') as file:
    index2word_skipgram = json.load(file)

In [31]:
# Calculate the semantic accuracy on the 'capital-common-countries' analogy dataset
semantic_accuracy_Skipgram = calculate_accuracy_skipgram(loaded_model_Skipgram, capital_common_countries, word2index_skipgram, index2word_skipgram)

# Calculate the syntactic accuracy on the 'gram7-past-tense' analogy dataset
syntactic_accuracy_Skipgram = calculate_accuracy_skipgram(loaded_model_Skipgram, past_tense, word2index_skipgram, index2word_skipgram)

# Print the semantic accuracy as a percentage
print(f"Skipgram Semantic Accuracy: {semantic_accuracy_Skipgram * 100:.2f}%")

# Print the syntactic accuracy as a percentage
print(f"Skipgram Syntactic Accuracy: {syntactic_accuracy_Skipgram * 100:.2f}%")

Skipgram Semantic Accuracy: 0.00%
Skipgram Syntactic Accuracy: 0.00%


In [35]:
class SkipgramNeg(nn.Module):
    
    def __init__(self, voc_size, emb_size):
        super(SkipgramNeg, self).__init__()
        self.embedding_center  = nn.Embedding(voc_size, emb_size)
        self.embedding_outside = nn.Embedding(voc_size, emb_size)
        self.logsigmoid        = nn.LogSigmoid()
    
    def forward(self, center, outside, negative):
        #center, outside:  (bs, 1)
        #negative       :  (bs, k)
        
        center_embed   = self.embedding_center(center) #(bs, 1, emb_size)
        outside_embed  = self.embedding_outside(outside) #(bs, 1, emb_size)
        negative_embed = self.embedding_outside(negative) #(bs, k, emb_size)
        
        uovc           = outside_embed.bmm(center_embed.transpose(1, 2)).squeeze(2) #(bs, 1)
        ukvc           = -negative_embed.bmm(center_embed.transpose(1, 2)).squeeze(2) #(bs, k)
        ukvc_sum       = torch.sum(ukvc, 1).reshape(-1, 1) #(bs, 1)
        
        loss           = self.logsigmoid(uovc) + self.logsigmoid(ukvc_sum)
        
        return -torch.mean(loss)


word2index_path = './model/word2index_skipgram_neg.json'  
index2word_path = './model/index2word_skipgram_neg.json' 
model_path = './model/word2vec_model_skipgram_neg.pth'
config_path = './model/word2vec_config_skipgram_neg.json'

with open(word2index_path, 'r') as file:
    word2index_SkipgramNeg = json.load(file)  # Load the word2index dictionary from the JSON file

with open(index2word_path, 'r') as file:
    index2word_SkipgramNeg = json.load(file)

# Load the model's configuration from a JSON file
with open(config_path, 'r') as config_file:
    config_SkipgramNeg = json.load(config_file)

# Retrieve the configuration values
voc_size = config_SkipgramNeg['voc_size']  # Vocabulary size
emb_size = config_SkipgramNeg['emb_size']  # Embedding size

# Initialize a new Word2Vec model with the loaded configuration
loaded_model_SkipgramNeg = SkipgramNeg(voc_size, emb_size)

# Load the state dictionary (model parameters) into the initialized model
loaded_model_SkipgramNeg.load_state_dict(torch.load(model_path))

# Set the model to evaluation mode (useful for inference)
loaded_model_SkipgramNeg.eval()

# Confirm successful model loading
print("Model loaded successfully")

Model loaded successfully


In [37]:
# Calculate semantic accuracy on 'capital-common-countries' analogies
semantic_accuracy_Skipgram_neg = calculate_accuracy_skipgram(loaded_model_SkipgramNeg, capital_common_countries, word2index_SkipgramNeg, index2word_SkipgramNeg)
# Calculate syntactic accuracy on 'gram7-past-tense' analogies
syntactic_accuracy_Skipgram_neg = calculate_accuracy_skipgram(loaded_model_SkipgramNeg, past_tense, word2index_SkipgramNeg, index2word_SkipgramNeg)

# Print the results
print(f"Skipgram-neg Semantic Accuracy: {semantic_accuracy_Skipgram_neg * 100:.2f}%")
print(f"Skipgram-neg Syntactic Accuracy: {syntactic_accuracy_Skipgram_neg * 100:.2f}%")

Skipgram-neg Semantic Accuracy: 0.00%
Skipgram-neg Syntactic Accuracy: 0.00%


In [39]:
class Glove(nn.Module):
    
    def __init__(self, voc_size, emb_size):
        super(Glove, self).__init__()
        # Embeddings for center words
        self.center_embedding  = nn.Embedding(voc_size, emb_size)
        # Embeddings for context (outside) words
        self.outside_embedding = nn.Embedding(voc_size, emb_size)
        
        # Bias terms for center words
        self.center_bias       = nn.Embedding(voc_size, 1) 
        # Bias terms for context (outside) words
        self.outside_bias      = nn.Embedding(voc_size, 1)
    
    def forward(self, center, outside, coocs, weighting):
        # Retrieve the embeddings for the center words
        center_embeds  = self.center_embedding(center)  # (batch_size, 1, emb_size)
        # Retrieve the embeddings for the outside words
        outside_embeds = self.outside_embedding(outside)  # (batch_size, 1, emb_size)
        
        # Retrieve and squeeze the bias for the center words
        center_bias    = self.center_bias(center).squeeze(1)
        # Retrieve and squeeze the bias for the outside words
        target_bias    = self.outside_bias(outside).squeeze(1)
        
        # Compute the dot product of center and outside word embeddings
        inner_product  = outside_embeds.bmm(center_embeds.transpose(1, 2)).squeeze(2)
        
        # Compute the GloVe loss as the weighted squared error between
        # the log co-occurrence counts and the model predictions (dot product + biases)
        loss = weighting * torch.pow(inner_product + center_bias + target_bias - coocs, 2)
        
        # Return the sum of the losses for the batch
        return torch.sum(loss)
    
word2index_path = './model/word2index_glove.json'  
index2word_path = './model/index2word_glove.json' 
model_path = './model/word2vec_model_glove.pth'
config_path = './model/word2vec_config_glove.json'
with open(word2index_path, 'r') as file:
    word2index_Glove = json.load(file)  # Load the word2index dictionary from the JSON file

with open(index2word_path, 'r') as file:
    index2word_Glove = json.load(file)  # Load the index2word dictionary from the JSON file
# Load the model's configuration from a JSON file
with open(config_path, 'r') as config_file:
    config_Glove = json.load(config_file)

# Retrieve the configuration values
voc_size = config_Glove['voc_size']  # Vocabulary size
emb_size = config_Glove['emb_size']  # Embedding size

# Initialize a new Word2Vec model with the loaded configuration
loaded_model_Glove = Glove(voc_size, emb_size)

# Load the state dictionary (model parameters) into the initialized model
loaded_model_Glove.load_state_dict(torch.load(model_path))

# Set the model to evaluation mode (useful for inference)
loaded_model_Glove.eval()

# Confirm successful model loading
print("Model loaded successfully")

Model loaded successfully


In [41]:
semantic_accuracy_GloVe = calculate_accuracy_GloVe(loaded_model_Glove, capital_common_countries, word2index_Glove, index2word_Glove)
syntactic_accuracy_GloVe = calculate_accuracy_GloVe(loaded_model_Glove, past_tense, word2index_Glove, index2word_Glove)
print(f"GloVe Semantic Accuracy: {semantic_accuracy_GloVe * 100:.2f}%")
print(f"GloVe Syntactic Accuracy: {syntactic_accuracy_GloVe * 100:.2f}%")


GloVe Semantic Accuracy: 0.00%
GloVe Syntactic Accuracy: 0.00%


In [42]:
pip install gensim

Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 23.2.1 -> 24.0
[notice] To update, run: python.exe -m pip install --upgrade pip


In [46]:
from gensim.scripts.glove2word2vec import glove2word2vec
from gensim.models.keyedvectors import KeyedVectors

# Path to the GloVe file (Replace this with your GloVe file path)
glove_file_path = 'dataset/glove.6B.100d.txt'  # Example path

# File path for the output Word2Vec format file
word2vec_output_file = glove_file_path + '.word2vec'

# Convert the GloVe file format to the Word2Vec file format
glove2word2vec(glove_file_path, word2vec_output_file)

# Load the model from the converted Word2Vec format file
loaded_model_Glove_Gen = KeyedVectors.load_word2vec_format(word2vec_output_file, binary=False)


  glove2word2vec(glove_file_path, word2vec_output_file)


In [49]:
semantic_accuracy_GloVe_Gen = calculate_accuracy_GloVe_gensim(loaded_model_Glove_Gen, capital_common_countries)
syntactic_accuracy_GloVe_Gen = calculate_accuracy_GloVe_gensim(loaded_model_Glove_Gen, past_tense)

# Print the semantic and syntactic accuracies
print(f"GloVe gensim Semantic Accuracy: {semantic_accuracy_GloVe_Gen * 100:.2f}%")
print(f"GloVe gensim Syntactic Accuracy: {syntactic_accuracy_GloVe_Gen * 100:.2f}%")


GloVe gensim Semantic Accuracy: 54.97%
GloVe gensim Syntactic Accuracy: 53.40%


### Model's Comparison Summary

| Model                             | Window Size | Training Loss(taken from Traning Notebooks) | Syntactic Accuracy | Semantic Accuracy |
|-----------------------------------|-------------|---------------|--------------------|-------------------|
| Skip-gram                         | 2           | 8.051740       | 0.00%              | 0.00%             |
| Skip-gram with Negative Sampling  | 2           | 1.905731        | 0.00%              | 0.00%             |
| GloVe Scratch                    | 2           |  1.391059        | 0.00%              | 0.00%             |
| GloVe (Pre-trained Gensim)        | N/A         | N/A           | 53.40%             | 54.97%            |



The table summarizes the performance of different word embedding models, highlighting their window size, training loss, syntactic accuracy, and semantic accuracy. Notably, the Skip-gram and Skip-gram with Negative Sampling models show a training loss of 8.051740 and 1.905731, respectively, but both yield 0% in syntactic and semantic accuracies. Conversely, the GloVe model developed from scratch presents a lower training loss of 1.391059, also with 0% accuracies. The pre-trained GloVe model from Gensim, not constrained by window size and without a specified training loss, significantly outperforms the others with over 53% in both syntactic and semantic accuracies, showcasing the effectiveness of pre-trained embeddings in capturing word relationships.

### 3. Use the similarity dataset to find the correlation between your models’ dot product and the provided similarity metrics.

In [50]:
import pandas as pd

path = 'wordsim_similarity_goldstandard.txt'

# Load the dataset
df = pd.read_csv(path, sep='\t', names=['word1', 'word2', 'human_score'])

# Display the first few rows of the DataFrame
print(df.head())


        word1  word2  human_score
0       tiger    cat         7.35
1       tiger  tiger        10.00
2       plane    car         5.77
3       train    car         6.31
4  television  radio         6.77


In [51]:
def calculate_similarity_skipgram_models(model, word_pairs, word2index):
    """
    Calculates the cosine similarity between pairs of words using embeddings from Skipgram or Skipgram-Negative Sampling models.
    
    Args:
    - model: The trained Skipgram or Skipgram-Negative Sampling model.
    - word_pairs: A list of tuples containing word pairs for similarity computation.
    - word2index: A dictionary mapping words to their respective indices in the model's vocabulary.
    
    Returns:
    - A list of cosine similarities for each word pair. Returns None for pairs containing out-of-vocabulary words.
    """
    similarities = []
    for word1, word2 in word_pairs:
        # Verify both words are in the model's vocabulary
        if word1 in word2index and word2 in word2index:
            # Retrieve indices and embeddings for both words
            word1_idx, word2_idx = word2index[word1], word2index[word2]
            word1_emb, word2_emb = [
                (model.embedding_center(torch.tensor([idx], dtype=torch.long)).squeeze(0) +
                 model.embedding_outside(torch.tensor([idx], dtype=torch.long)).squeeze(0)) / 2
                for idx in [word1_idx, word2_idx]
            ]
            
            # Compute cosine similarity
            similarity = 1 - cosine(word1_emb.detach().numpy(), word2_emb.detach().numpy())
            similarities.append(similarity)
        else:
            # Handle out-of-vocabulary words
            similarities.append(None)
            
    return similarities


In [58]:
def calculate_similarity_glove(model, word_pairs, word2index):
    """
    Calculates cosine similarities between pairs of words using GloVe model embeddings.
    
    Args:
    - model: The GloVe model trained from scratch.
    - word_pairs: A list of tuples with word pairs for which to compute similarities.
    - word2index: Dictionary mapping words to their indices in the model's vocabulary.
    
    Returns:
    - List of similarities for each word pair, with None for pairs containing OOV words.
    """
    similarities = []
    for word1, word2 in word_pairs:
        if word1 in word2index and word2 in word2index:
            # Retrieve index for each word
            word1_idx, word2_idx = word2index[word1], word2index[word2]
            
            # Average center and outside embeddings for a comprehensive word representation
            word1_emb, word2_emb = [
                (model.center_embedding(torch.tensor([idx], dtype=torch.long)).squeeze(0) +
                 model.outside_embedding(torch.tensor([idx], dtype=torch.long)).squeeze(0)) / 2
                for idx in [word1_idx, word2_idx]
            ]
            
            # Convert PyTorch tensors to numpy arrays for cosine similarity calculation
            word1_emb_np, word2_emb_np = word1_emb.detach().numpy(), word2_emb.detach().numpy()
            
            # Compute similarity as 1 minus the cosine distance
            similarity = 1 - cosine(word1_emb_np, word2_emb_np)
            similarities.append(similarity)
        else:
            # Append None for word pairs where either word is out of vocabulary
            similarities.append(None)
            
    return similarities


In [53]:
def calculate_similarity_gensim_glove(model, word_pairs):
    """
    Computes cosine similarities for word pairs using a GloVe model loaded with Gensim.
    
    Args:
    - model: Pre-trained GloVe model loaded via Gensim.
    - word_pairs: List of tuples containing word pairs for similarity calculation.
    
    Returns:
    - A list of similarities for each word pair, with None for pairs involving OOV words.
    """
    similarities = []
    for word1, word2 in word_pairs:
        if word1 in model.key_to_index and word2 in model.key_to_index:
            similarity = model.similarity(word1, word2)
            similarities.append(similarity)
        else:
            # Append None if either word is not in the model's vocabulary
            similarities.append(None)
    return similarities

def compute_correlation_with_human_judgment(model, dataset):
    """
    Calculates Spearman's rank correlation between model-derived similarities and human judgment scores.
    
    Args:
    - model: The GloVe model loaded with Gensim for which to compute similarities.
    - dataset: DataFrame with columns 'word1', 'word2', and 'human_score' indicating human-assigned similarity scores.
    
    Returns:
    - Spearman's rank correlation coefficient between model similarities and human scores.
    """
    word_pairs = list(zip(dataset['word1'], dataset['word2']))
    model_similarities = calculate_similarity_gensim_glove(model, word_pairs)
    
    # Exclude pairs with OOV words to align model and human scores
    valid_scores = [(human, model) for human, model in zip(dataset['human_score'], model_similarities) if model is not None]
    filtered_human_scores, filtered_model_scores = zip(*valid_scores)
    
    correlation, _ = spearmanr(filtered_human_scores, filtered_model_scores)
    return correlation


In [56]:
# Load Skipgram model similarities for word pairs from the DataFrame and compute similarity scores
model_similarities_skipgram = calculate_similarity_skipgram_models(loaded_model_Skipgram, list(zip(df['word1'], df['word2'])), word2index_skipgram)
# Exclude word pairs with at least one OOV word to ensure valid comparison between model and human scores
filtered_human_scores = [human_score for human_score, model_score in zip(df['human_score'], model_similarities_skipgram) if model_score is not None]
filtered_model_scores = [model_score for model_score in model_similarities_skipgram if model_score is not None]

# Spearman's rank correlation calculation to assess the alignment between model-derived similarities and human judgment
correlation, _ = spearmanr(filtered_human_scores, filtered_model_scores)
# Output the correlation result, indicating the model's performance in mirroring human semantic judgments
print(f"Spearman's rank correlation for Skipgram: {correlation:.3f}")


Spearman's rank correlation for Skipgram: 0.124


In [57]:
# Calculate model similarities using the skipgram negative-sampling model
model_similarities_skipgram_neg = calculate_similarity_skipgram_models(loaded_model_SkipgramNeg, list(zip(df['word1'], df['word2'])), word2index_SkipgramNeg)

# Filter out pairs where at least one word was OOV (Out Of Vocabulary)
filtered_human_scores = [human_score for human_score, model_score in zip(df['human_score'], model_similarities_skipgram_neg) if model_score is not None]
filtered_model_scores = [model_score for model_score in model_similarities_skipgram_neg if model_score is not None]

# Calculate Spearman's rank correlation between human scores and model scores
correlation, _ = spearmanr(filtered_human_scores, filtered_model_scores)

print(f"Spearman's rank correlation for Skipgram-Neg-Sampling: {correlation:.3f}")


Spearman's rank correlation for Skipgram-Neg-Sampling: 0.116


In [59]:
# Calculate model similarities using the GloVe Scratch model
model_similarities_Glove_Scratch = calculate_similarity_glove(loaded_model_Glove, list(zip(df['word1'], df['word2'])), word2index_Glove)

# Filter out pairs where at least one word was OOV (Out Of Vocabulary)
filtered_human_scores = [human_score for human_score, model_score in zip(df['human_score'], model_similarities_Glove_Scratch) if model_score is not None]
filtered_model_scores = [model_score for model_score in model_similarities_Glove_Scratch if model_score is not None]

# Calculate Spearman's rank correlation between human scores and model scores
correlation, _ = spearmanr(filtered_human_scores, filtered_model_scores)

# Print the Spearman's rank correlation coefficient for GloVe Scratch model
print(f"Spearman's rank correlation for GloVe Scratch: {correlation:.3f}")


Spearman's rank correlation for GloVe Scratch: -0.091


In [60]:
# Calculate Spearman's rank correlation using a GloVe Gensim model and a DataFrame
correlation = compute_correlation_with_human_judgment(loaded_model_Glove_Gen, df)

# Print the Spearman's rank correlation coefficient for GloVe Gensim
print(f"Spearman's rank correlation for GloVe Gensim: {correlation:.3f}")


Spearman's rank correlation for GloVe Gensim: 0.602


In [62]:
models = ["Skipgram", "Skipgram-Neg-Sampling", "GloVe Scratch", "GloVe Gensim"]
correlation_coefficients = [0.124, 0.116, -0.091, 0.602]

print("| Model                     | Spearman's Rank Correlation |")
print("|---------------------------|-----------------------------|")

for model, correlation in zip(models, correlation_coefficients):
    print(f"| {model:<26} | {correlation:>27.3f} |")


| Model                     | Spearman's Rank Correlation |
|---------------------------|-----------------------------|
| Skipgram                   |                       0.124 |
| Skipgram-Neg-Sampling      |                       0.116 |
| GloVe Scratch              |                      -0.091 |
| GloVe Gensim               |                       0.602 |


Spearman's rank correlation coefficient is a measure of the strength and direction of association between two ranked variables. In the context of word embedding models:

Skipgram: This model achieved a Spearman's rank correlation coefficient of 0.124.

Skipgram-Neg-Sampling: This model achieved a Spearman's rank correlation coefficient of 0.116.

GloVe Scratch: This model achieved a Spearman's rank correlation coefficient of -0.091.

GloVe Gensim: This model achieved a Spearman's rank correlation coefficient of 0.602.

These coefficients indicate the degree of similarity between the similarity scores predicted by the models and the human-rated similarity scores for word pairs. A coefficient closer to 1 indicates a strong positive correlation, while a coefficient closer to -1 indicates a 
strong negative correlation. A coefficient close to 0 suggests a weak correlation.