# Lab.5: Lexical semantics
## Introduction to Human Language Technologies
### Victor Badenas Crespo

***

### Statement:

Given the following (lemma, category) pairs:
```python
(’the’,’DT’), (’man’,’NN’), (’swim’,’VB’), (’with’, ’PR’), (’a’, ’DT’),
(’girl’,’NN’), (’and’, ’CC’), (’a’, ’DT’), (’boy’, ’NN’), (’whilst’, ’PR’),
(’the’, ’DT’), (’woman’, ’NN’), (’walk’, ’VB’)
```

- For each pair, when possible, print their most frequent WordNet synset, their corresponding least common subsumer (LCS) and their similarity value, using the following functions:

    - Path Similarity

    - Leacock-Chodorow Similarity

    - Wu-Palmer Similarity

    - Lin Similarity

- Normalize similarity values when necessary. What similarity seems better?

*** 

## Solution

Import necessary packages and declare environment valiables.

In [1]:
import nltk
import numpy as np
from nltk.corpus import wordnet as wn
from nltk.corpus import wordnet_ic
nltk.download('wordnet_ic')

DATA = [
    ('the','DT'), ('man','NN'), ('swim','VB'), ('with', 'PR'), ('a', 'DT'),
    ('girl','NN'), ('and', 'CC'), ('a', 'DT'), ('boy', 'NN'), ('whilst', 'PR'),
    ('the', 'DT'), ('woman', 'NN'), ('walk', 'VB')
]

brownIc = wordnet_ic.ic('ic-brown.dat')

FilteredData = list(filter(lambda x: x[1].lower()[0] in ('v', 'n'), DATA))

[nltk_data] Downloading package wordnet_ic to
[nltk_data]     C:\Users\victo\AppData\Roaming\nltk_data...
[nltk_data]   Package wordnet_ic is already up-to-date!


In [2]:
synsets = list()
for word, posTag in FilteredData:
    posTag = posTag.lower()[0]
    if posTag in ('n', 'v'):
        synset = wn.synsets(word, posTag)[0]
        synsets.append(synset)

In [4]:
PathSimilarities = np.full((len(synsets), len(synsets)), np.nan)
LCSimilarities = np.full((len(synsets), len(synsets)), np.nan)
WuPalmerSimilarities = np.full((len(synsets), len(synsets)), np.nan)
LinSimilarities = np.full((len(synsets), len(synsets)), np.nan)
LCS = [[None for j in range(len(synsets))] for i in range(len(synsets))]

for i, sourceSynset in enumerate(synsets):
    for j, targetSynset in enumerate(synsets):
        LCS[i][j] = sourceSynset.lowest_common_hypernyms(targetSynset)
        if len(LCS[i][j]) > 0 and (FilteredData[i][1].lower()[0] == FilteredData[j][1].lower()[0]):
            PathSimilarities[i, j] = sourceSynset.path_similarity(targetSynset)
            LCSimilarities[i, j] = sourceSynset.lch_similarity(targetSynset)
            WuPalmerSimilarities[i, j] = sourceSynset.wup_similarity(targetSynset)
            LinSimilarities[i, j] = sourceSynset.lin_similarity(targetSynset, brownIc)

LCSimilarities = LCSimilarities / np.nanmax(LCSimilarities)

In [5]:
print("PathSimilarities:")
print(PathSimilarities)
print("LCSimilaities:")
print(LCSimilarities)
print("WuPalmerSimilarities:")
print(WuPalmerSimilarities)
print("LinSimilarities:")
print(LinSimilarities)

PathSimilarities:
[[1.                nan 0.25       0.33333333 0.33333333        nan]
 [       nan 1.                nan        nan        nan 0.33333333]
 [0.25              nan 1.         0.16666667 0.5               nan]
 [0.33333333        nan 0.16666667 1.         0.2               nan]
 [0.33333333        nan 0.5        0.2        1.                nan]
 [       nan 0.33333333        nan        nan        nan 1.        ]]
LCSimilaities:
[[1.                nan 0.61889718 0.69798316 0.69798316        nan]
 [       nan 0.89567543        nan        nan        nan 0.59365858]
 [0.61889718        nan 1.         0.50743174 0.80944859        nan]
 [0.69798316        nan 0.50743174 1.         0.55755332        nan]
 [0.69798316        nan 0.80944859 0.55755332 1.                nan]
 [       nan 0.59365858        nan        nan        nan 0.89567543]]
WuPalmerSimilarities:
[[1.                nan 0.63157895 0.66666667 0.66666667        nan]
 [       nan 1.                nan        nan 

In [None]:
def printSynsetDistanceMatric(synsets, LCSMatrix, distanceMatrix):
    for i, sourceSynset in enumerate(synsets):
        for j, targetSynset in enumerate(synsets):
            

***

## Conclusions

***

### End of P4