## Loading modules

In [4]:
import codecs
import random
from nltk.tokenize import sent_tokenize, word_tokenize
from nltk.tag import pos_tag
from nltk.wsd import lesk
from nltk.corpus import stopwords, wordnet as wn
from nltk.stem import WordNetLemmatizer
from pywsd.lesk import simple_lesk, original_lesk, cosine_lesk, adapted_lesk

lemmatizer = WordNetLemmatizer()
stops = set(stopwords.words('english'))
puncts  = str.maketrans('','',"\'\":,?!.«»()")

### Helper functions

In [5]:
def get_wordnet_pos(treebank_tag):
    if treebank_tag.startswith('J'):
        return wn.ADJ
    elif treebank_tag.startswith('V'):
        return wn.VERB
    elif treebank_tag.startswith('N'):
        return wn.NOUN
    elif treebank_tag.startswith('R'):
        return wn.ADV
    else:
        return ''

In [6]:
def present(sent):
    sent_tkn = word_tokenize(sent.translate(puncts))
    tknz = pos_tag(sent_tkn)
    # print(tknz)
    presentation_sent = []
    for a in tknz:
        wpos = get_wordnet_pos(a[1])
        # print(a, wpos)
        if wpos:
            lm = lemmatizer.lemmatize(a[0], pos=wpos).lower()
            if lm not in stops:
                # print(a[0], wpos)
                presentation_sent.append({"token":a[0], "pos": a[1], "lemma": lm, "wpos": wpos})
    return presentation_sent

## My implemenatation of Lesk algorithm

In [7]:
def custom_lesk(word, sent, examples=False):
    bestsense = None
    prsent = present(sent)
    maxoverlap = 0
    w=wn.morphy(word) if wn.morphy(word) is not None else word
    prw = {}
    for i in prsent:
        if i["lemma"] == word:
            prw = i
            prsent.remove(i)
    
    flatprsent = [a["lemma"] for a in prsent]    
    results  = {}
    allsyn = []
    rels  = {}
    for ss in wn.synsets(w):
        if ss.pos() == prw["wpos"]:
            allsyn.append(ss)
            defprsent  = present(ss.definition())
            flatdefprsent = [a["lemma"] for a in defprsent]
            if examples:
                for ex in ss.examples():
                    flatex = [a["lemma"] for a in present(ex)]
                    flatdefprsent.extend(flatex)
            rels[ss] = [flatprsent, flatdefprsent]
            results[ss]  = len(set(flatprsent) & set(flatdefprsent))            
    best  = max(results, key=results.get)
    return [best if best else allsyn[0], results[best], rels[best]]

## Minimal test

In [8]:
rr  = custom_lesk("key", "A man locked the door with a key.")

In [9]:
rr

[Synset('key.n.01'),
 1,
 [['man', 'lock', 'door'],
  ['metal',
   'device',
   'shape',
   'way',
   'insert',
   'appropriate',
   'lock',
   'lock',
   'mechanism',
   'rotate']]]

In [10]:
rr[0].definition()

"metal device shaped in such a way that when it is inserted into the appropriate lock the lock's mechanism can be rotated"

### Evaluation

It seems to work on an arbitrary example.

## Processing corpus

In [11]:
with open('corpus_eng.txt', 'r', encoding='utf-8') as f:
    lines = f.read().splitlines()
allbreaks = []
for line in lines:
    for sent in sent_tokenize(line):
        if "break" in sent:
            tokenized = word_tokenize(sent)
            if "break" in tokenized:
                allbreaks.append(sent)
somebreaks = random.sample(allbreaks, 10)

In [12]:
somebreaks

['On Monday’s Dancing with the Stars , Olympic gymnast Laurie Hernandez took a break from her rehearsal schedule to visit her ailing grandmother.',
 'The first was during the Question Time that came live from Wembley, when she interrupted some military posturing by Boris to sniff: “ I think I’m the only one on this panel who’s ever worn the Queen’s uniform.” The next saw her decline to even break stride as a reporter asked her about Boris crashing out of the race to succeed David Cameron, only smiling lightly that, “Leadership is hard.',
 "To properly remove the stain if the mineral content in water is not an issue, pretreat the sunscreen stain with a with a prewash stain remover spray or gel like Zout or Shout or Spray 'n Wash or a bit of heavy duty liquid detergent ( Wisk , Tide or Persil are rated as the best brands with enough enzymes to break apart the oily component of the stain) or make a paste with powder detergent/water.",
 'After he and his mother fled their home state, pursu

In [13]:
keyword = "break"

In [21]:
for index, sent in enumerate(somebreaks):    
    tokenized = word_tokenize(sent)
    tagged = pos_tag(tokenized)
    for tag in tagged:
        if tag[0] == keyword:
            wpos = get_wordnet_pos(tag[1])
            ss  = lesk(sent, 'break', wpos)
            tan_ol = original_lesk(sent, 'break')
            tan_sl = simple_lesk(sent, 'break', pos=wpos)
            tan_cl = cosine_lesk(sent, 'break', pos=wpos)
            tan_al = adapted_lesk(sent, 'break', pos=wpos)
            [cl, rate, dbg]  = custom_lesk(keyword, sent)
            [cl2, rate2, dbg2]  = custom_lesk(keyword, sent, examples=True)
            
            print("#"+str(index+1))
            print(sent)
            print("Lesk NLTK:", ss, ss.definition())
            print("Tan Orig:", tan_ol, tan_ol.definition())
            print("Tan Simple:", tan_sl, tan_sl.definition())
            print("Tan Cos:", tan_cl, tan_cl.definition())
            print("Tan Adapt:", tan_al, tan_al.definition())
            print("Custom:", cl, cl.definition())
            print("rate", rate, "data:", dbg)
            print("Custom+Examples:", cl2, cl2.definition())
            print("rate", rate2, "data:", dbg2)
            print("-"*100)

#1
On Monday’s Dancing with the Stars , Olympic gymnast Laurie Hernandez took a break from her rehearsal schedule to visit her ailing grandmother.
Lesk NLTK: Synset('rupture.n.02') a personal or social separation (as between opposing factions)
Tan Orig: Synset('fault.n.04') (geology) a crack in the earth's crust resulting from the displacement of one side with respect to the other
Tan Simple: Synset('respite.n.02') a pause from doing something (as work)
Tan Cos: Synset('respite.n.02') a pause from doing something (as work)
Tan Adapt: Synset('respite.n.02') a pause from doing something (as work)
Custom: Synset('interruption.n.02') some abrupt occurrence that interrupts an ongoing activity
rate 0 data: [['monday', '’', 'dancing', 'stars', 'olympic', 'gymnast', 'laurie', 'hernandez', 'take', 'rehearsal', 'schedule', 'visit', 'ail', 'grandmother'], ['abrupt', 'occurrence', 'interrupt', 'ongoing', 'activity']]
Custom+Examples: Synset('respite.n.02') a pause from doing something (as work)
ra

#6
Ravens at Cowboys: Dak Prescott has the reins, but Ezekiel Elliott will have to find a way to break through against the Ravens' top-ranked rushing defense.
Lesk NLTK: Synset('unwrap.v.02') make known to the public information that was previously known only to a few people or that was meant to be kept a secret
Tan Orig: Synset('fault.n.04') (geology) a crack in the earth's crust resulting from the displacement of one side with respect to the other
Tan Simple: Synset('separate.v.08') discontinue an association or relation; go different ways
Tan Cos: Synset('break_dance.v.01') do a break dance
Tan Adapt: Synset('separate.v.08') discontinue an association or relation; go different ways
Custom: Synset('separate.v.08') discontinue an association or relation; go different ways
rate 1 data: [['ravens', 'cowboys', 'dak', 'prescott', 'rein', 'ezekiel', 'elliott', 'find', 'way', 'ravens', 'top-ranked', 'rushing', 'defense'], ['discontinue', 'association', 'relation', 'go', 'different', 'way']]

## Discussion

### 1
`On Monday’s Dancing with the Stars , Olympic gymnast Laurie Hernandez took a break from her rehearsal schedule to visit her ailing grandmother.
Lesk NLTK: Synset('rupture.n.02') a personal or social separation (as between opposing factions)
Tan Orig: Synset('fault.n.04') (geology) a crack in the earth's crust resulting from the displacement of one side with respect to the other
Tan Simple: Synset('respite.n.02') a pause from doing something (as work)
Tan Cos: Synset('respite.n.02') a pause from doing something (as work)
Tan Adapt: Synset('respite.n.02') a pause from doing something (as work)
Custom: Synset('interruption.n.02') some abrupt occurrence that interrupts an ongoing activity
rate 0 data: [['monday', '’', 'dancing', 'stars', 'olympic', 'gymnast', 'laurie', 'hernandez', 'take', 'rehearsal', 'schedule', 'visit', 'ail', 'grandmother'], ['abrupt', 'occurrence', 'interrupt', 'ongoing', 'activity']]
Custom+Examples: Synset('respite.n.02') a pause from doing something (as work)
rate 1 data: [['monday', '’', 'dancing', 'stars', 'olympic', 'gymnast', 'laurie', 'hernandez', 'take', 'rehearsal', 'schedule', 'visit', 'ail', 'grandmother'], ['pause', 'something', 'work', 'take', '10-minute', 'break', 'take', 'time', 'recuperate']]`
**Generic NLTK fails, most of Tan's methods succeeds.  
My implemementations:  
definition fails   
definition + examples succeeds, because of "take"**
### 2
`The first was during the Question Time that came live from Wembley, when she interrupted some military posturing by Boris to sniff: “ I think I’m the only one on this panel who’s ever worn the Queen’s uniform.” The next saw her decline to even break stride as a reporter asked her about Boris crashing out of the race to succeed David Cameron, only smiling lightly that, “Leadership is hard.
Lesk NLTK: Synset('unwrap.v.02') make known to the public information that was previously known only to a few people or that was meant to be kept a secret
Tan Orig: Synset('fault.n.04') (geology) a crack in the earth's crust resulting from the displacement of one side with respect to the other
Tan Simple: Synset('break.v.27') come forth or begin from a state of latency
Tan Cos: Synset('unwrap.v.02') make known to the public information that was previously known only to a few people or that was meant to be kept a secret
Tan Adapt: Synset('break.v.27') come forth or begin from a state of latency
Custom: Synset('break.v.16') come into being
rate 1 data: [['first', 'question', 'time', 'come', 'live', 'wembley', 'interrupt', 'military', 'posturing', 'boris', 'sniff', '“', 'think', '’', 'one', 'panel', '’', 'ever', 'wear', 'queen', '’', 'uniform', 'next', 'saw', 'decline', 'even', 'stride', 'reporter', 'ask', 'boris', 'crash', 'race', 'succeed', 'david', 'cameron', 'smile', 'lightly', '“', 'leadership', 'hard'], ['come']]
Custom+Examples: Synset('break.v.27') come forth or begin from a state of latency   
rate 2 data: [['first', 'question', 'time', 'come', 'live', 'wembley', 'interrupt', 'military', 'posturing', 'boris', 'sniff', '“', 'think', '’', 'one', 'panel', '’', 'ever', 'wear', 'queen', '’', 'uniform', 'next', 'saw', 'decline', 'even', 'stride', 'reporter', 'ask', 'boris', 'crash', 'race', 'succeed', 'david', 'cameron', 'smile', 'lightly', '“', 'leadership', 'hard'], ['come', 'forth', 'begin', 'state', 'latency', 'first', 'winter', 'storm', 'break', 'new', 'york']]`

**It must be Synset('interrupt.v.04'), everything fails.   
My implemementations: definition fails, definition + examples fails, common words were "first" and "come", which is meaningless.  Also NLTK sentence splitter worked badly, that didn't come in useful as well.**

### 3
`To properly remove the stain if the mineral content in water is not an issue, pretreat the sunscreen stain with a with a prewash stain remover spray or gel like Zout or Shout or Spray 'n Wash or a bit of heavy duty liquid detergent ( Wisk , Tide or Persil are rated as the best brands with enough enzymes to break apart the oily component of the stain) or make a paste with powder detergent/water.
Lesk NLTK: Synset('unwrap.v.02') make known to the public information that was previously known only to a few people or that was meant to be kept a secret
Tan Orig: Synset('break_in.v.01') enter someone's (virtual or real) property in an unauthorized manner, usually with the intent to steal or commit a violent act
Tan Simple: Synset('unwrap.v.02') make known to the public information that was previously known only to a few people or that was meant to be kept a secret
Tan Cos: Synset('violate.v.01') fail to agree with; be in violation of; as of rules or patterns
Tan Adapt: Synset('unwrap.v.02') make known to the public information that was previously known only to a few people or that was meant to be kept a secret
Custom: Synset('break_in.v.06') make submissive, obedient, or useful
rate 1 data: [['properly', 'remove', 'stain', 'mineral', 'content', 'water', 'issue', 'pretreat', 'sunscreen', 'stain', 'prewash', 'stain', 'remover', 'spray', 'gel', 'zout', 'shout', 'spray', 'n', 'wash', 'bit', 'heavy', 'duty', 'liquid', 'detergent', 'wisk', 'tide', 'persil', 'rat', 'best', 'brand', 'enough', 'enzyme', 'apart', 'oily', 'component', 'stain', 'make', 'paste', 'powder', 'detergent/water'], ['make', 'submissive', 'obedient', 'useful']]
Custom+Examples: Synset('break.v.02') become separated into pieces or fragments
rate 1 data: [['properly', 'remove', 'stain', 'mineral', 'content', 'water', 'issue', 'pretreat', 'sunscreen', 'stain', 'prewash', 'stain', 'remover', 'spray', 'gel', 'zout', 'shout', 'spray', 'n', 'wash', 'bit', 'heavy', 'duty', 'liquid', 'detergent', 'wisk', 'tide', 'persil', 'rat', 'best', 'brand', 'enough', 'enzyme', 'apart', 'oily', 'component', 'stain', 'make', 'paste', 'powder', 'detergent/water'], ['become', 'separate', 'piece', 'fragment', 'figurine', 'break', 'freshly', 'bake', 'loaf', 'fell', 'apart']]`

**It should be Synset('break.v.05'). Everything fails, however my def+examples was closer than other, because of "apart". 
One could say, it reflects issues of description of the lexeme in Wordnet.**

### 4
`After he and his mother fled their home state, pursuing US Marshals got their break after one of the Couch’s cell phones was used to order a Domino’s pizza.
Lesk NLTK: Synset('rupture.n.02') a personal or social separation (as between opposing factions)
Tan Orig: Synset('fault.n.04') (geology) a crack in the earth's crust resulting from the displacement of one side with respect to the other
Tan Simple: Synset('fault.n.04') (geology) a crack in the earth's crust resulting from the displacement of one side with respect to the other
Tan Cos: Synset('break.n.12') (tennis) a score consisting of winning a game when your opponent was serving
Tan Adapt: Synset('fault.n.04') (geology) a crack in the earth's crust resulting from the displacement of one side with respect to the other
Custom: Synset('interruption.n.02') some abrupt occurrence that interrupts an ongoing activity
rate 0 data: [['mother', 'flee', 'home', 'state', 'pursue', 'marshals', 'get', 'couch', '’', 'cell', 'phone', 'use', 'order', 'domino', '’', 'pizza'], ['abrupt', 'occurrence', 'interrupt', 'ongoing', 'activity']]
Custom+Examples: Synset('break.n.02') an unexpected piece of good luck
rate 1 data: [['mother', 'flee', 'home', 'state', 'pursue', 'marshals', 'get', 'couch', '’', 'cell', 'phone', 'use', 'order', 'domino', '’', 'pizza'], ['unexpected', 'piece', 'good', 'luck', 'finally', 'get', 'big', 'break']]`

**It must be Synset('respite.n.02'), but my def implementation was closer than other, however it chose the meaning just because there was no best candidate, and first meaning was selected.**

### 5
`Tobey's mother, Joyce Mitchell, helped Richard Matt and David Sweat break out of New York's maximum security Clinton Correctional Facility last year.
Lesk NLTK: Synset('unwrap.v.02') make known to the public information that was previously known only to a few people or that was meant to be kept a secret
Tan Orig: Synset('break.v.09') force out or release suddenly and often violently something pent up
Tan Simple: Synset('break.v.27') come forth or begin from a state of latency
Tan Cos: Synset('break.v.51') find the solution or key to
Tan Adapt: Synset('break.v.27') come forth or begin from a state of latency
Custom: Synset('interrupt.v.04') terminate
rate 0 data: [['tobeys', 'joyce', 'mitchell', 'help', 'richard', 'matt', 'david', 'sweat', 'new', 'yorks', 'maximum', 'security', 'clinton', 'correctional', 'facility', 'last', 'year'], ['terminate']]
Custom+Examples: Synset('break.v.07') move away or escape suddenly
rate 1 data: [['tobeys', 'joyce', 'mitchell', 'help', 'richard', 'matt', 'david', 'sweat', 'new', 'yorks', 'maximum', 'security', 'clinton', 'correctional', 'facility', 'last', 'year'], ['move', 'away', 'escape', 'suddenly', 'horse', 'break', 'stable', 'inmate', 'break', 'jail', 'nobody', 'break', 'prison', 'high', 'security']]`

**The only success is on the side of my def+example implemenation, it did work, because of "security", which actually make sense.   
Tan Original looks close, but actually it is refelexive meaning, that is wrong here.**

### 6
`Ravens at Cowboys: Dak Prescott has the reins, but Ezekiel Elliott will have to find a way to break through against the Ravens' top-ranked rushing defense.
Lesk NLTK: Synset('unwrap.v.02') make known to the public information that was previously known only to a few people or that was meant to be kept a secret
Tan Orig: Synset('fault.n.04') (geology) a crack in the earth's crust resulting from the displacement of one side with respect to the other
Tan Simple: Synset('separate.v.08') discontinue an association or relation; go different ways
Tan Cos: Synset('break_dance.v.01') do a break dance
Tan Adapt: Synset('separate.v.08') discontinue an association or relation; go different ways
Custom: Synset('separate.v.08') discontinue an association or relation; go different ways
rate 1 data: [['ravens', 'cowboys', 'dak', 'prescott', 'rein', 'ezekiel', 'elliott', 'find', 'way', 'ravens', 'top-ranked', 'rushing', 'defense'], ['discontinue', 'association', 'relation', 'go', 'different', 'way']]
Custom+Examples: Synset('fail.v.04') stop operating or functioning
rate 1 data: [['ravens', 'cowboys', 'dak', 'prescott', 'rein', 'ezekiel', 'elliott', 'find', 'way', 'ravens', 'top-ranked', 'rushing', 'defense'], ['stop', 'operating', 'functioning', 'engine', 'finally', 'go', 'car', 'die', 'road', 'bus', 'travel', 'break', 'way', 'town', 'coffee', 'maker', 'break', 'engine', 'fail', 'way', 'town', 'eyesight', 'go', 'accident']]`

**The right match is Synset('break.v.45') 'pierce or penetrate'.   
Everything fails.  
My implemetation of def only prefers Synset('separate.v.08'), because of "way".   
One with examples selected ('fail.v.04'), because of "way", but as soon as in this iteration it became upper than Synset('separate.v.08'), it was chosen.**

### 7
`The Argentine then conjured up a pair of break points at 4-3 with a miscued forehand from Cilic paving the way for Del Potro to complete a stunning fightback -- his first ever from two sets down -- in four hours and 53 minutes.
Lesk NLTK: Synset('rupture.n.02') a personal or social separation (as between opposing factions)
Tan Orig: Synset('fault.n.04') (geology) a crack in the earth's crust resulting from the displacement of one side with respect to the other
Tan Simple: Synset('break.n.12') (tennis) a score consisting of winning a game when your opponent was serving
Tan Cos: Synset('break.n.12') (tennis) a score consisting of winning a game when your opponent was serving
Tan Adapt: Synset('break.n.12') (tennis) a score consisting of winning a game when your opponent was serving
Custom: Synset('interruption.n.02') some abrupt occurrence that interrupts an ongoing activity
rate 0 data: [['argentine', 'conjure', 'pair', 'point', '4-3', 'miscued', 'forehand', 'cilic', 'pave', 'way', 'del', 'potro', 'complete', 'stunning', 'fightback', 'first', 'ever', 'set', 'hour', 'minute'], ['abrupt', 'occurrence', 'interrupt', 'ongoing', 'activity']]
Custom+Examples: Synset('break.n.12') (tennis) a score consisting of winning a game when your opponent was serving
rate 1 data: [['argentine', 'conjure', 'pair', 'point', '4-3', 'miscued', 'forehand', 'cilic', 'pave', 'way', 'del', 'potro', 'complete', 'stunning', 'fightback', 'first', 'ever', 'set', 'hour', 'minute'], ['tennis', 'score', 'consisting', 'win', 'game', 'opponent', 'serve', 'break', 'second', 'set']]`

**All Tan's Lesks (except original one) succeed.   
My def implemenatation fails, but def+example wins because of "set" in example.  
However, generally it could be treated as idiom "break point", not a separate word "break".**

### 8
`Al-Yousef said Fastaqim arrested one of its commanders during battle preparations in Aleppo after a rebel offensive launched last week to break the government's siege of the city's rebel neighborhoods.
Lesk NLTK: Synset('unwrap.v.02') make known to the public information that was previously known only to a few people or that was meant to be kept a secret
Tan Orig: Synset('fault.n.04') (geology) a crack in the earth's crust resulting from the displacement of one side with respect to the other
Tan Simple: Synset('break_in.v.01') enter someone's (virtual or real) property in an unauthorized manner, usually with the intent to steal or commit a violent act
Tan Cos: Synset('violate.v.01') fail to agree with; be in violation of; as of rules or patterns
Tan Adapt: Synset('break_in.v.01') enter someone's (virtual or real) property in an unauthorized manner, usually with the intent to steal or commit a violent act
Custom: Synset('interrupt.v.04') terminate
rate 0 data: [['al-yousef', 'say', 'fastaqim', 'arrest', 'commander', 'battle', 'preparation', 'aleppo', 'rebel', 'offensive', 'launch', 'last', 'week', 'government', 'siege', 'city', 'rebel', 'neighborhood'], ['terminate']]
Custom+Examples: Synset('break_in.v.01') enter someone's (virtual or real) property in an unauthorized manner, usually with the intent to steal or commit a violent act
rate 1 data: [['al-yousef', 'say', 'fastaqim', 'arrest', 'commander', 'battle', 'preparation', 'aleppo', 'rebel', 'offensive', 'launch', 'last', 'week', 'government', 'siege', 'city', 'rebel', 'neighborhood'], ['enter', 'someone', 'virtual', 'real', 'property', 'unauthorized', 'manner', 'usually', 'intent', 'steal', 'commit', 'violent', 'act', 'someone', 'break', 'vacation', 'break', 'car', 'steal', 'radio', 'break', 'account', 'last', 'night']]`


**Again, Synset('break.v.45') 'pierce or penetrate' has to be chosen. And also, I have to note that the differences between meanings Synset('break.v.45') and Synset('break_in.v.01') are not so clear.    
If we treat Synset('break_in.v.01') as a good enough answer, Tan's Simple and Adaptive had luck.   
Also my implemenatation of def+examples did, but, as we can see, it is only because of "last" in examples, which is meaningless.**

### 9
`A few hours after taking a break from promotional duties with Iris, Redmayne chatted in a downtown Manhattan hotel about his headlong dive into Rowling's empire, the film's multicultural message and just how many movies he's gotten himself into.
Lesk NLTK: Synset('rupture.n.02') a personal or social separation (as between opposing factions)
Tan Orig: Synset('fault.n.04') (geology) a crack in the earth's crust resulting from the displacement of one side with respect to the other
Tan Simple: Synset('respite.n.02') a pause from doing something (as work)
Tan Cos: Synset('respite.n.02') a pause from doing something (as work)
Tan Adapt: Synset('respite.n.02') a pause from doing something (as work)
Custom: Synset('interruption.n.02') some abrupt occurrence that interrupts an ongoing activity
rate 0 data: [['hour', 'take', 'promotional', 'duty', 'iris', 'redmayne', 'chat', 'downtown', 'manhattan', 'hotel', 'headlong', 'dive', 'rowlings', 'empire', 'film', 'multicultural', 'message', 'many', 'movie', 'get'], ['abrupt', 'occurrence', 'interrupt', 'ongoing', 'activity']]
Custom+Examples: Synset('break.n.02') an unexpected piece of good luck
rate 1 data: [['hour', 'take', 'promotional', 'duty', 'iris', 'redmayne', 'chat', 'downtown', 'manhattan', 'hotel', 'headlong', 'dive', 'rowlings', 'empire', 'film', 'multicultural', 'message', 'many', 'movie', 'get'], ['unexpected', 'piece', 'good', 'luck', 'finally', 'get', 'big', 'break']]`

**Again, Tan's methods were succesful (Synset('respite.n.02')).  
My def implementation just chose first item, which is wrong.  
My def+example chose Synset('break.n.02'), because of "get" in the examples field, when "he's gotten himself into..." has no relation to it in any meaningfull way.**

### 10
`If that fails then it's time to break out your nut tool , which are also made to use on stuck cams as well as nuts.
Lesk NLTK: Synset('unwrap.v.02') make known to the public information that was previously known only to a few people or that was meant to be kept a secret
Tan Orig: Synset('open_frame.n.01') any frame in which a bowler fails to make a strike or spare
Tan Simple: Synset('violate.v.01') fail to agree with; be in violation of; as of rules or patterns
Tan Cos: Synset('break_in.v.06') make submissive, obedient, or useful
Tan Adapt: Synset('violate.v.01') fail to agree with; be in violation of; as of rules or patterns
Custom: Synset('break_in.v.06') make submissive, obedient, or useful
rate 1 data: [['fail', 'time', 'nut', 'tool', 'also', 'make', 'use', 'stuck', 'cam', 'well', 'nut'], ['make', 'submissive', 'obedient', 'useful']]
Custom+Examples: Synset('break_in.v.06') make submissive, obedient, or useful
rate 1 data: [['fail', 'time', 'nut', 'tool', 'also', 'make', 'use', 'stuck', 'cam', 'well', 'nut'], ['make', 'submissive', 'obedient', 'useful', 'horse', 'tough', 'break', 'break', 'new', 'intern']]`

**Actually, it's "break out", phrasal verb. It's close to Synset('break.v.46') 'be released or become known'. Also I note that Synset('break.v.46') seems to be really close to Synset('unwrap.v.02'), which is a problem of Wordnet.
My implemenation chose Synset('break_in.v.06'), because of "make" for definition and as well "make" for Synset('break_in.v.06'), repeating the mechanism of \#6.**

----------------------------------------------------------------------------------------------------

​

## Conslusions

The best result were shown by sophisticated methods by Liling Tan. Actually, he also co-authored [NLTK WSD](https://www.nltk.org/_modules/nltk/wsd.html) module of NLTK toolset, along with Dmitrijs Milajevs, but for some reasons, with our dataset, Lesk algorithm from NLTK worked worse than any other alternative, even mine.

Thus, in this light, my implemenatations look not so bad, however, as we can see, sometimes good choice was just meaningless coincedence.

And the best score (by Tan's algorithms) was just 3 of 10. We should keep in mind that it's just one sample from the corpus, and I am not sure we could get even the same score consistenly on other samples.

Also, it's worth to mention, that WordNet as itself is a bad source (BTW as a project it was discontinued years ago). Some meanings seems to be the same one, but separated for some reasons. I could conclude the task of aligning token/sentence to WordNet synset is not really reasonable. We rather should relate them to entities we are going (and are able) to use. For example, if we implement search engine, it makes sense to produce embeddings which are based on our corpus and then to relate new texts (tokens, sentences) to our embeddings, than to weird WordNet synsets.

So:
1. Tan is cool, but any algorhitmic implementation is not so good for production.
2. WordNet is bad, don't use it.
3. ML seems to be only possible way out to deal with WSD, at least, without much pain.