# Lexical Substitution

In this homework we find suitable substitutions for a target word in a sentence.

After three steps, we reach a score of 48.32.

## 1. Implement Baseline / Retrofitting 


In [1]:
from lexsub import *  

 Frist turn word vectors into a dictionary:

In [2]:
def load_wvecs(wvec_file):
    wordVectors = {}
    for key, vector in wvec_file:
        wordVectors[key] = np.zeros(len(vector), dtype=float)
        for index, vecVal in enumerate(vector):
            wordVectors[key][index] = float(vecVal)
        ''' normalize weight vector '''
        wordVectors[key] /= math.sqrt((wordVectors[key]**2).sum() + 1e-6)
    return wordVectors

Take a look at the first 5 lines:

Download glove.6B.100d.magnitude if you do not have it:

!wget http://magnitude.plasticity.ai/glove/medium/glove.6B.100d.magnitude

In [6]:
from pymagnitude import *
wv = Magnitude("../data/glove.6B.100d.magnitude")
for key, vector in wv[:10]:
    print(key, vector[:5])

the [-0.0065612 -0.0420655  0.1250817 -0.0686479  0.0142879]
, [-0.0193882  0.0199032  0.1077039 -0.0978882  0.1213604]
. [-0.0622309  0.0383524  0.0848841 -0.1186634 -0.0702856]
of [-0.0242819 -0.0385573  0.1426693  0.0269912  0.0849883]
to [-0.0294079  0.0077549  0.0295846 -0.0076247 -0.0139113]
and [-0.012695   0.0408041  0.004187  -0.0893432  0.0598521]
in [ 0.0140624 -0.0364279  0.0271868  0.0219427  0.0627435]
a [-0.043387   0.007049  -0.0032453 -0.0278637  0.1032215]
" [-0.0462585 -0.0359123  0.0266947 -0.1106516 -0.0430477]
's [ 0.0883406 -0.0303955  0.1102929 -0.1025762 -0.0295324]


In [7]:
word_vector = load_wvecs(pymagnitude.Magnitude("../data/glove.6B.100d.magnitude"))

For lexicon, we have tried wordnet and ppdb-xl respectively. Ppdb-xl turns out to give a better performance. Then import this lexicon as a dictionary:

In [8]:
def load_lexicon(lexicon_file):
    lexicon = {}
    for line in open(lexicon_file, 'r'):
        words = line.lower().strip().split()
        lexicon[words[0]] = [word for word in words[1:]]
    return lexicon

Take a look at the first five lines of the lexicon:

Then load the file into a dictionary:

In [9]:
lexicon = load_lexicon("../data/lexicons/ppdb-xl.txt")

Now we have the word vectors and the lexicon. Our goal is to use these semantic relations in lexicon to augment the word representations in the pre-trained word vectors. We then use the retrofit method introduced in baseline solution. We set the intital iteration to 10, and set initial α=β=1. Below is the implementation of the retrofit function:

In [10]:
def retrofitting(wordVectors, lexicon, numberOfIteration, alpha=1, beta=1):
    # Q to be equal to Q_hat
    # Initialize Q to be equal to the vectors in q_hat
    q_hat = wordVectors
    q = deepcopy(wordVectors) # copy a mutable new q vector

    # find connected vocab between (wi, wj)
    wvecKey = set(wordVectors.keys())
    lexcionKey = set(lexicon.keys())
    
    # if (wi,wj) are connected by an edge, get word which append in lexicon and word embedding 
    
    exist_word_vector = wvecKey.intersection(lexcionKey)

    # Retrofitting, we want to get new Q which append in lexicon space, therefore, it will not update the

    # So if (wi,wj) are connected by an edge in the ontology then we want qi and qj to be close in vector space.
    # numberOfIteration = 10
    for iteration in range(numberOfIteration):
        # Do Update Q here
        # neighbors q 
        for word in exist_word_vector:
            # The defined neighbours word inside lexicon for this word  
            lexicon_words = set(lexicon[word])
            # Get the neighbours word exists in word vector
            neighbours_words = lexicon_words.intersection(wvecKey)

            num_neighbours_words = len(neighbours_words) 
            if(num_neighbours_words == 0): # No need to update if there is no neighours word found
                continue 

            # alpha = 1
            # beta = 1
            # sum number j of neighbours_words (xi*q_hat) 
            numerator = alpha * q_hat[word]
            # add sum number j of neighbours_words (beta *q) 

            for nei_word in neighbours_words:
                numerator += beta * q[nei_word] 

            denominator = num_neighbours_words * (alpha + beta)

            # update the word vector 
            q[word] = numerator / denominator

    return q # Return matrix of new vector Q here 

Then call the function and save the new word vectors after retrofitting:

In [12]:
new_retrofitted_magnitude = os.path.join('..', 'data', 'glove.6B.100d.retrofit.magnitude')
new_retrofitted_txt = os.path.join('..' , 'data', 'glove.6B.100d.retrofit.txt')
retrofitted_vector = retrofitting(word_vector, lexicon, 10, 1, 1)
save_word_vecs(retrofitted_vector, new_retrofitted_txt) 
os.system("python3 -m pymagnitude.converter -i " + new_retrofitted_txt + " -o " + new_retrofitted_magnitude)

0

Now we run the baseline solution without considering incorporating context words: First introduce the class "LexSub" which contains the function "substitutes" to produce the top ten subsitutes.

In [13]:
class LexSub:

    def __init__(self, retrofitted_magnitude, wvec_file, retrofitted_vector, topn=10):
        self.retrofitted_magnitude = pymagnitude.Magnitude(retrofitted_magnitude)  # This is the Q_hat vector  100 dimenstional GloVe word vectors
        self.topn = topn 
        self.wvecs = wvec_file
        self.wvecKey = set(self.wvecs.keys())
        self.retrofitted_vector = retrofitted_vector

    def substitutes(self, index, sentence, use_context_word=False):
        "Return ten guesses that are appropriate lexical substitutions for the word at sentence[index]."
        # return the 10 guess word after created the retrofitted word vectors
        substitutes = list(map(lambda k: k[0], self.retrofitted_magnitude.most_similar(sentence[index], topn=15)))
        # Incorporating Context Words
        if use_context_word:
            substitutes = incorporating_context_words(self.retrofitted_vector, self.wvecKey, sentence, index, substitutes, self.topn)
        return substitutes

In [16]:
lexsub = LexSub(new_retrofitted_magnitude, word_vector, retrofitted_vector, 10)
    
num_lines = sum(1 for line in open('../data/input/dev.txt','r'))
with open("../data/input/dev.txt") as f:
    for line in tqdm.tqdm(f, total=num_lines):
        fields = line.strip().split('\t')
        print(" ".join(lexsub.substitutes(int(fields[0].strip()), fields[1].strip().split())))

  4%|▎         | 61/1703 [00:00<00:09, 182.40it/s]

sides bottom back edge along into away way both hand shoulder onto behind over part
sides bottom back edge along into away way both hand shoulder onto behind over part
sides bottom back edge along into away way both hand shoulder onto behind over part
sides bottom back edge along into away way both hand shoulder onto behind over part
sides bottom back edge along into away way both hand shoulder onto behind over part
sides bottom back edge along into away way both hand shoulder onto behind over part
sides bottom back edge along into away way both hand shoulder onto behind over part
sides bottom back edge along into away way both hand shoulder onto behind over part
sides bottom back edge along into away way both hand shoulder onto behind over part
sides bottom back edge along into away way both hand shoulder onto behind over part
said asked interview warned speaking talked acknowledged noted reported tell called stated telling saying quoted
say telling do want hear think know sure how el

  8%|▊         | 141/1703 [00:01<00:14, 108.04it/s]

coach trainers teams coaching players teammates athletes instructors sports quarterbacks basketball assistants trainer coached administrators
coach trainers teams coaching players teammates athletes instructors sports quarterbacks basketball assistants trainer coached administrators
coaches advocaat coached manager coaching parcells trainer hiddink mcclaren teammates belichick basketball athlete pitino squad
coach trainers teams coaching players teammates athletes instructors sports quarterbacks basketball assistants trainer coached administrators
coaches advocaat coached manager coaching parcells trainer hiddink mcclaren teammates belichick basketball athlete pitino squad
coaches advocaat coached manager coaching parcells trainer hiddink mcclaren teammates belichick basketball athlete pitino squad
damp humid wet moist arid moisture drying soils drier wetland dried acidic salty barren humidity
damp humid wet moist arid moisture drying soils drier wetland dried acidic salty barren humid

 13%|█▎        | 223/1703 [00:02<00:16, 90.77it/s] 

pulse impulses legumes oscillations waveforms plugs outputs grains threads harmonics spikes tubers bulbs strands triggers
pulse impulses legumes oscillations waveforms plugs outputs grains threads harmonics spikes tubers bulbs strands triggers
pulses vibration waveform pulsation amplitude heartbeat attenuation propagation coil tuning modulation amplification vibrations electromagnetic oscillations
putting make bring them made out get take further taking could even taken would over
putting make bring them made out get take further taking could even taken would over
putting make bring them made out get take further taking could even taken would over
putting make bring them made out get take further taking could even taken would over
putting make bring them made out get take further taking could even taken would over
putting make bring them made out get take further taking could even taken would over
putting make bring them made out get take further taking could even taken would over
putt

 19%|█▉        | 325/1703 [00:02<00:08, 155.76it/s]

bugs glitch pest worm infestation insect viruses virus plague beetle cockroach parasite flaw insects pests
roaches cockroaches insects beetles parasites pests worms mosquitoes fleas bug maggots bedbugs rodents termites critters
bugs glitch pest worm infestation insect viruses virus plague beetle cockroach parasite flaw insects pests
bugs glitch pest worm infestation insect viruses virus plague beetle cockroach parasite flaw insects pests
bugs glitch pest worm infestation insect viruses virus plague beetle cockroach parasite flaw insects pests
bugs glitch pest worm infestation insect viruses virus plague beetle cockroach parasite flaw insects pests
roaches cockroaches insects beetles parasites pests worms mosquitoes fleas bug maggots bedbugs rodents termites critters
roaches cockroaches insects beetles parasites pests worms mosquitoes fleas bug maggots bedbugs rodents termites critters
bugs glitch pest worm infestation insect viruses virus plague beetle cockroach parasite flaw insects p

 24%|██▍       | 411/1703 [00:03<00:05, 225.14it/s]

match coincidences appearances tournaments sets pairs pairings game matchups match-ups repetitions contests friendlies foursomes fourball
final play game matches replay draw score semifinal scoring pairing qualifying matching semi-final finals chance
final play game matches replay draw score semifinal scoring pairing qualifying matching semi-final finals chance
final play game matches replay draw score semifinal scoring pairing qualifying matching semi-final finals chance
final play game matches replay draw score semifinal scoring pairing qualifying matching semi-final finals chance
final play game matches replay draw score semifinal scoring pairing qualifying matching semi-final finals chance
final play game matches replay draw score semifinal scoring pairing qualifying matching semi-final finals chance
final play game matches replay draw score semifinal scoring pairing qualifying matching semi-final finals chance
final play game matches replay draw score semifinal scoring pairing qua

 29%|██▉       | 500/1703 [00:03<00:04, 294.26it/s]

holy holies latter-day basilica cathedral blessed apostles saint church sepulchre cardinals christ brees parish st.
st. st sant santa holy basilica san church santo sankt cathedral chapel saviour sainte convent
st. st sant santa holy basilica san church santo sankt cathedral chapel saviour sainte convent
st. st sant santa holy basilica san church santo sankt cathedral chapel saviour sainte convent
st. st sant santa holy basilica san church santo sankt cathedral chapel saviour sainte convent
st. st sant santa holy basilica san church santo sankt cathedral chapel saviour sainte convent
st. st sant santa holy basilica san church santo sankt cathedral chapel saviour sainte convent
yet though even already however so once ever although also because nevertheless actually seems indeed
yet though even already however so once ever although also because nevertheless actually seems indeed
yet though even already however so once ever although also because nevertheless actually seems indeed
yet thou

 35%|███▌      | 598/1703 [00:03<00:03, 302.13it/s]

check examined inspected checking searched checks verified monitored identified properly found analyzed notified verify handled
check examined inspected checking searched checks verified monitored identified properly found analyzed notified verify handled
checking checks checked verify monitoring audit inspection control get determine find examine monitor notice handle
removed disposed stripped recovered reclaimed cancelled bulldozed eliminated blocked abandoned withdrawn deleted dismantled temporarily lifted
eliminates removes erases softens cancels disposes dismisses cleans dissolves eases paves postpones reinstates dissipates restores
removed disposed stripped recovered reclaimed cancelled bulldozed eliminated blocked abandoned withdrawn deleted dismantled temporarily lifted
obvious unambiguous definite evident explicit vague clearly absolutely compelling contrary unclear apparent unequivocal straightforward easy
clearance removal disposal logging removing elimination erosion waste 

 40%|███▉      | 674/1703 [00:03<00:03, 316.47it/s]

rejects refuses denies rebuffs expels spurns submits ignores dismiss denounces condemns revokes downplays cancels disallows
reject dismissing deny exclude rejecting refuse refusing ignore disqualify overturn dismisses overrule refused prosecute claim
reject dismissing deny exclude rejecting refuse refusing ignore disqualify overturn dismisses overrule refused prosecute claim
rejected denied refused sacked challenged ruled fired overruled resigned claimed quashed dismissing condemned handed cancelled
bring put make drawn attract drawing get hoping away give further turn play chance take
brought drawing drew draw attracted turned made given taken many being some put presented came
drawn draw showing picture show shows drew painting display collection many some creating put making
brought drawing drew draw attracted turned made given taken many being some put presented came
bring put make drawn attract drawing get hoping away give further turn play chance take
bring put make drawn attract

 45%|████▌     | 767/1703 [00:04<00:02, 369.60it/s]

reformist free-market right-wing liberals centrist left-leaning liberalism left-wing progressive pro-business eldr reform-minded socialist constitutionalist right-leaning
reformist free-market right-wing liberals centrist left-leaning liberalism left-wing progressive pro-business eldr reform-minded socialist constitutionalist right-leaning
reformist free-market right-wing liberals centrist left-leaning liberalism left-wing progressive pro-business eldr reform-minded socialist constitutionalist right-leaning
reformist free-market right-wing liberals centrist left-leaning liberalism left-wing progressive pro-business eldr reform-minded socialist constitutionalist right-leaning
reformist free-market right-wing liberals centrist left-leaning liberalism left-wing progressive pro-business eldr reform-minded socialist constitutionalist right-leaning
reformist free-market right-wing liberals centrist left-leaning liberalism left-wing progressive pro-business eldr reform-minded socialist consti

 47%|████▋     | 807/1703 [00:04<00:02, 341.90it/s]

skipped skip jumping canceling forgoing omitting bypassing breaks cramming contemplating postponing skips bumping delaying cancelling
skipping skipped jump 'll opt go going gonna opting choose hurry opted forgo pick skips
skipped skip jumping canceling forgoing omitting bypassing breaks cramming contemplating postponing skips bumping delaying cancelling
skipping ditched skip bumped hopped missed stumbled knocked kicked pulled threw yanked withdrew tripped jumped
skipping skipped jump 'll opt go going gonna opting choose hurry opted forgo pick skips
skipping skipped jump 'll opt go going gonna opting choose hurry opted forgo pick skips
skipping skipped jump 'll opt go going gonna opting choose hurry opted forgo pick skips
skipping ditched skip bumped hopped missed stumbled knocked kicked pulled threw yanked withdrew tripped jumped
sneaks limps wanders sidesteps slithers crawls hustles omits scampers eases trots saunters ambles strolls stomps
robust sturdy strong resilient dependable rel

 52%|█████▏    | 888/1703 [00:04<00:02, 355.59it/s]

deserving worthwhile commendable laudable praiseworthy admirable merit decent exemplary meaningful merits excellent accomplishment meritorious best
deserving worthwhile commendable laudable praiseworthy admirable merit decent exemplary meaningful merits excellent accomplishment meritorious best
deserving worthwhile commendable laudable praiseworthy admirable merit decent exemplary meaningful merits excellent accomplishment meritorious best
deserving worthwhile commendable laudable praiseworthy admirable merit decent exemplary meaningful merits excellent accomplishment meritorious best
deserving worthwhile commendable laudable praiseworthy admirable merit decent exemplary meaningful merits excellent accomplishment meritorious best
deserving worthwhile commendable laudable praiseworthy admirable merit decent exemplary meaningful merits excellent accomplishment meritorious best
deserving worthwhile commendable laudable praiseworthy admirable merit decent exemplary meaningful merits excell

 57%|█████▋    | 974/1703 [00:04<00:01, 378.24it/s]

caravans motorcade convoy trailer camper wagon minivan truck car excursion convoys limousine getaway jeep suv
caravans motorcade convoy trailer camper wagon minivan truck car excursion convoys limousine getaway jeep suv
caravans motorcade convoy trailer camper wagon minivan truck car excursion convoys limousine getaway jeep suv
caravans motorcade convoy trailer camper wagon minivan truck car excursion convoys limousine getaway jeep suv
trailers convoys campers caravan lorries trucks wagons trailer vans carts camping minibuses canoes tractors rvs
trailers convoys campers caravan lorries trucks wagons trailer vans carts camping minibuses canoes tractors rvs
caravans motorcade convoy trailer camper wagon minivan truck car excursion convoys limousine getaway jeep suv
caravans motorcade convoy trailer camper wagon minivan truck car excursion convoys limousine getaway jeep suv
caravans motorcade convoy trailer camper wagon minivan truck car excursion convoys limousine getaway jeep suv
carava

 63%|██████▎   | 1067/1703 [00:04<00:01, 389.69it/s]

suffer withstand survive suffering endured hardship hardships tolerate inflict succumb subjected cope prolonged severe adversity
suffer withstand survive suffering endured hardship hardships tolerate inflict succumb subjected cope prolonged severe adversity
suffered suffering undergone experienced endure subjected hardships suffer faced hardship experiencing inflicted decades prolonged ordeal
suffer withstand survive suffering endured hardship hardships tolerate inflict succumb subjected cope prolonged severe adversity
suffer withstand survive suffering endured hardship hardships tolerate inflict succumb subjected cope prolonged severe adversity
lasting constant genuine persistent profound continuous eternal continuing everlasting strong continual remarkable long-lasting indispensable evident
suffered suffering undergone experienced endure subjected hardships suffer faced hardship experiencing inflicted decades prolonged ordeal
suffer withstand survive suffering endured hardship hardsh

 67%|██████▋   | 1148/1703 [00:05<00:01, 390.13it/s]

hold holding being met made given conducted taken previously was took brought been entered once
hold holding being met made given conducted taken previously was took brought been entered once
hold holding being met made given conducted taken previously was took brought been entered once
hold held taking keeping keep giving put for putting making gathering own them bring take
hold holding being met made given conducted taken previously was took brought been entered once
hold held taking keeping keep giving put for putting making gathering own them bring take
hold holding being met made given conducted taken previously was took brought been entered once
hold held taking keeping keep giving put for putting making gathering own them bring take
unofficial unstructured impromptu casual formal ad-hoc structured hoc face-to-face non-formal meetings meeting in-person consultation unsanctioned
unofficial unstructured impromptu casual formal ad-hoc structured hoc face-to-face non-formal meetings 

 73%|███████▎  | 1247/1703 [00:05<00:01, 424.83it/s]

comprehensive inclusive all-year open-ended open-door transparent full reopen informal in-depth opening open-air establishment fledged reopening
comprehensive inclusive all-year open-ended open-door transparent full reopen informal in-depth opening open-air establishment fledged reopening
comprehensive inclusive all-year open-ended open-door transparent full reopen informal in-depth opening open-air establishment fledged reopening
comprehensive inclusive all-year open-ended open-door transparent full reopen informal in-depth opening open-air establishment fledged reopening
comprehensive inclusive all-year open-ended open-door transparent full reopen informal in-depth opening open-air establishment fledged reopening
comprehensive inclusive all-year open-ended open-door transparent full reopen informal in-depth opening open-air establishment fledged reopening
comprehensive inclusive all-year open-ended open-door transparent full reopen informal in-depth opening open-air establishment fle

 79%|███████▉  | 1351/1703 [00:05<00:00, 440.56it/s]

tempo melody rhythms tune pace guitar riffs cadence vocals melodic song funk melodies harmonies blues
tempo melody rhythms tune pace guitar riffs cadence vocals melodic song funk melodies harmonies blues
rhythm riffs paces tempos syncopated tempo melodies harmonies speeds melodic percussive tunes cadences solos routines
tempo melody rhythms tune pace guitar riffs cadence vocals melodic song funk melodies harmonies blues
tempo melody rhythms tune pace guitar riffs cadence vocals melodic song funk melodies harmonies blues
rhythm riffs paces tempos syncopated tempo melodies harmonies speeds melodic percussive tunes cadences solos routines
tempo melody rhythms tune pace guitar riffs cadence vocals melodic song funk melodies harmonies blues
rhythm riffs paces tempos syncopated tempo melodies harmonies speeds melodic percussive tunes cadences solos routines
scene images footage pictures depictions imagery vignettes depict descriptions episodes sketches narrative portraits narratives flashbac

 82%|████████▏ | 1398/1703 [00:05<00:00, 381.29it/s]

hit shook rocked knocked rattled broke ripped touched hitting blew jolted shattered fired caught shaken
hit shook rocked knocked rattled broke ripped touched hitting blew jolted shattered fired caught shaken
remarkable dramatic stunning surprising unusual startling impressive spectacular astonishing amazing notable tremendous significant superb extraordinary
strikes walkout stoppage bombing airstrikes halt protest blow stop raid firing bombardment break attack stopped
hit shook rocked knocked rattled broke ripped touched hitting blew jolted shattered fired caught shaken
hit shook rocked knocked rattled broke ripped touched hitting blew jolted shattered fired caught shaken
airstrikes strike attacks bombings bombing bombardment shelling raids blasts raid attack bombed explosions airstrike firing
airstrikes strike attacks bombings bombing bombardment shelling raids blasts raid attack bombed explosions airstrike firing
remarkable dramatic stunning surprising unusual startling impressive sp

 88%|████████▊ | 1492/1703 [00:05<00:00, 413.20it/s]

genuinely really absolutely indeed definitely surely obviously certainly quite clearly genuine fully very totally actually
genuinely really absolutely indeed definitely surely obviously certainly quite clearly genuine fully very totally actually
genuinely really absolutely indeed definitely surely obviously certainly quite clearly genuine fully very totally actually
genuinely really absolutely indeed definitely surely obviously certainly quite clearly genuine fully very totally actually
genuinely really absolutely indeed definitely surely obviously certainly quite clearly genuine fully very totally actually
genuinely really absolutely indeed definitely surely obviously certainly quite clearly genuine fully very totally actually
genuinely really absolutely indeed definitely surely obviously certainly quite clearly genuine fully very totally actually
genuinely really absolutely indeed definitely surely obviously certainly quite clearly genuine fully very totally actually
genuinely really

 93%|█████████▎| 1576/1703 [00:06<00:00, 374.10it/s]

executions executed executing execute sentencing sentence trial judgment verdict arrest punishment decision conviction procedure penalty
executions executed executing execute sentencing sentence trial judgment verdict arrest punishment decision conviction procedure penalty
executions executed executing execute sentencing sentence trial judgment verdict arrest punishment decision conviction procedure penalty
executions executed executing execute sentencing sentence trial judgment verdict arrest punishment decision conviction procedure penalty
executions executed executing execute sentencing sentence trial judgment verdict arrest punishment decision conviction procedure penalty
drop rise falls falling rises decline down tumble slump rising jump next plunge dip up
tumbled slipped dropped plunged slid plummeted slumped jumped climbed retreated surged dipped soared sagged fallen
dropped falling plummeted fell risen climbed declined tumbled plunged slipped slumped down surged dropping dipped

 95%|█████████▍| 1615/1703 [00:06<00:00, 338.28it/s]

print printing papers printed document newspaper documents books copy book journal watermarked pamphlet photocopy bulletin
print printing papers printed document newspaper documents books copy book journal watermarked pamphlet photocopy bulletin
print printing papers printed document newspaper documents books copy book journal watermarked pamphlet photocopy bulletin
print printing papers printed document newspaper documents books copy book journal watermarked pamphlet photocopy bulletin
print printing papers printed document newspaper documents books copy book journal watermarked pamphlet photocopy bulletin
print printing papers printed document newspaper documents books copy book journal watermarked pamphlet photocopy bulletin
documents articles newspapers journals document publication records diaries books files newspaper notes essays written publish
documents articles newspapers journals document publication records diaries books files newspaper notes essays written publish
document

100%|██████████| 1703/1703 [00:06<00:00, 263.01it/s]

caught shoot fired shooting punched hit knocked kicked killed beaten pulled missed accidentally shots struck
caught shoot fired shooting punched hit knocked kicked killed beaten pulled missed accidentally shots struck
rounds bullets strokes punches shoot volleys blows shot gunshots beatings bursts knock shooters gunshot shooting
caught shoot fired shooting punched hit knocked kicked killed beaten pulled missed accidentally shots struck
rounds bullets strokes punches shoot volleys blows shot gunshots beatings bursts knock shooters gunshot shooting
rounds bullets strokes punches shoot volleys blows shot gunshots beatings bursts knock shooters gunshot shooting
rounds bullets strokes punches shoot volleys blows shot gunshots beatings bursts knock shooters gunshot shooting
caught shoot fired shooting punched hit knocked kicked killed beaten pulled missed accidentally shots struck
rounds bullets strokes punches shoot volleys blows shot gunshots beatings bursts knock shooters gunshot shooting




After running zipout.py and check.py, we get a score of 44.92.

## 2. Incorporate Context Words (score = 32.12)

Following the instructions of the homework, we try to augment the baseline solution to incorporate the context around the target word to find better substitute words. "After reading the paper A Simple Word Embedding Model for Lexical Substitution", we first try the BalAdd measure in Table one to introduce a context-sensitive substitutability
metric for estimating the suitability of a lexical substitute for a target word in a given sentential context:

In [17]:
def incorporating_context_words(wvecs, wvecKey, sentence, index, substitutes, topn):
    # Incorporating Context Words
    sort_substitutes = []
    non_target_words_set = set()
    for word in sentence[:index]:
        non_target_words_set.add(word)
    for word in sentence[index+1:len(sentence)]:
        non_target_words_set.add(word)
    non_target_words = non_target_words_set.intersection(wvecKey)
    target = sentence[index]
    num_non_target_words = len(non_target_words)
    for substitute in substitutes:
        numerator = cos(wvecs[substitute], wvecs[target]) * num_non_target_words
        if(num_non_target_words == 0):
            continue
        for non_target_word in non_target_words:
            numerator = numerator + cos(wvecs[substitute], wvecs[non_target_word])
        denominator = num_non_target_words * 2 # divide by number of non target word, also divide by 2 for getting average after adding target and non target cos
        score = numerator/denominator
        sort_substitutes.append((substitute, score))
    sort_substitutes = sorted(sort_substitutes, key=lambda x: x[1], reverse=False)
    # print(sort_substitutes)
    return map(lambda x: x[0], sort_substitutes[:topn])

Changing the parameter use_context_word=True, and follow the above steps, using zipout.py and check.py, we get a score for incorporating context words of score=32.12, which does not improve the original method. 

## 3. Changing α and β  (α=1,β=2.19, score= 48.32) 

Instead of incorporating context words into the solution, we try to find a better score by changing the values of α and β. To change the value, assign different values when calling the function "retrofitting(wordVectors, lexicon, numberOfIteration, alpha, beta). Repeating the substitution steps and we have the following relationship:

Thus we have reached a higher score 48.32 when α=1,β=2.19.

## Conclusion

In this task, we have finishes three taskes:

(1) Implement the baseline solution 


(2) Incorporate context words into the baseline solution

(3) Adjusting parameters α,β for the baseline solution 

After these three tasks, we have reach a score 48.32 for this problem.