# Def2Vec

This notebook shows usage examples of the [Def2Vec](https://aclanthology.org/2023.icnlsp-1.21) word embedding model

## Environment preparation

In the following we are going to do some preliminary steps:
- Mounting drive (optional)
- Installing dependancies
- Importing packages

### Mount drive

> **NOTE**
>
> This step is required only if you are running the notebook in Google Colab.

Mount the Google Drive storage with the embedding model to load

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Set the current working directory to the one with the embedding model to load

In [None]:
import os
os.chdir('/content/drive/MyDrive/Def2Vec')  # Here should go the path to the directory with the embeddings
os.getcwd()

### Install dependancies

Install all the Python packages necessary to run this notebook

In [None]:
!pip -q install numpy
!pip -q install nltk
!pip -q install POT
!pip -q install gensim

### Import packages

Import all the Python packages necessary to run this example

In [None]:
from gensim.models import KeyedVectors

## Load model

Load the embedding model

In [None]:
path = './def2vec_en_wikitionary_300.kv'

In [None]:
def_2_vec = def2vec = KeyedVectors.load(path)

## Examples

These examples are based on the examples of the *KeyedVectors* in the [Gensim](https://radimrehurek.com/gensim/index.html) package.
Source: [link](https://radimrehurek.com/gensim/models/keyedvectors.html#what-can-i-do-with-word-vectors).

### Embeddings extraction

In [None]:
vector = def_2_vec['computer']  # numpy vector of a word
print(vector.shape)

Apply normalisation

In [None]:
vector = def_2_vec.get_vector('office', norm=True)
print(sum(vector ** 2))

### Analogies

Find the word that could solve the analogy.

Using the default *cosine similarity* measure.

In [None]:
result = def_2_vec.most_similar(positive=['woman', 'king'], negative=['man'])
most_similar_key, similarity = result[0]  # look at the first match
print(f"{most_similar_key}: {similarity:.4f}")

Using a different similarity measure: *cosmul* and a .

In [None]:
result = def_2_vec.most_similar_cosmul(positive=['woman', 'king'], negative=['man'])
most_similar_key, similarity = result[0]  # look at the first match
print(f"{most_similar_key}: {similarity:.4f}")

### Outliers

Identify outliers

In [None]:
print(def_2_vec.doesnt_match("breakfast cereal dinner lunch".split()))

### Word similarity

Compute word similarity

Word pair

In [None]:
similarity = def_2_vec.similarity('woman', 'man')
print(similarity > 0.8)

Multiple words

In [None]:
similarity = def_2_vec.n_similarity(['sushi', 'shop'], ['japanese', 'restaurant'])
print(f"{similarity:.4f}")

### Search words

Search for the most similar word(s)

In [None]:
result = def_2_vec.similar_by_word("cat")
most_similar_key, similarity = result[0]  # look at the first match
print(f"{most_similar_key}: {similarity:.4f}")

### Words Mover's Distance

Compute similarity between two sentences using the *Words Mover's Distance*

In [None]:
sentence_1 = 'Obama speaks to the media in Illinois'.lower().split()
sentence_2 = 'The president greets the press in Chicago'.lower().split()

In [None]:
similarity = def_2_vec.wmdistance(sentence_1, sentence_2)
print(f"{similarity:.4f}")

In [None]:
distance = def_2_vec.distance("media", "media")
print(f"{distance:.1f}")