### (a) Loading the model

For this practice, we will be using GloVe word vectors converted into word2vec format.

The "50d" indicates that our words are represented as 50 dimensional vectors.

This was chosen to keep the file size small and inference fast, but if you are willing to see how bigger dimensional vectors work then you can download the GloVe vectors at [this zip file](https://nlp.stanford.edu/data/glove.6B.zip)

In [None]:
import os

directory = os.getcwd()

In [None]:
import numpy as np

%matplotlib notebook
import matplotlib.pyplot as plt
plt.style.use('ggplot')

from sklearn.decomposition import PCA

from gensim.test.utils import datapath, get_tmpfile
from gensim.models import KeyedVectors
from gensim.scripts.glove2word2vec import glove2word2vec

glove_file = datapath(f'{directory}/glove.6B.50d.txt') # Original glove file here
word2vec_glove_file = get_tmpfile("glove.6B.50d.word2vec.txt")
glove2word2vec(glove_file, word2vec_glove_file)

In [None]:
model = KeyedVectors.load_word2vec_format(word2vec_glove_file)

### (b) Word relationship and similarities

The "most_similar" function lets you see the well... "most similar" word of your input.

It can also take "negative" as an argument and return the furthest word from your input. 

Give various words a try by assigning a word to the "your_word" variable!

Also give the "analogy" function a try. (Guess what this function does!)

And input your own set of words by in the cell marked as "your example".

In [None]:
# fill in the word that you want to try out (be creative!)
your_word = 

In [None]:
model.most_similar(your_word)

In [None]:
model.most_similar(negative=your_word)

In [None]:
def analogy(x1, x2, y1):
    result = model.most_similar(positive=[y1, x2], negative=[x1])
    return result[0][0]

In [None]:
# example 1
analogy('japan', 'japanese', 'australia')

In [None]:
# example 2
analogy('king', 'man', 'queen')

In [None]:
# your example
analogy(, , )

### (c) Visualizing the word vector space

The "display_pca_scatterplot" function takes a list of words as input and visualizes the vector space of those words for you.

Try to fill the list "your_words" with your own choice of words (at least 20) and see how they are represented in the vector space.

In [None]:
def display_pca_scatterplot(model, words=None, sample=0, filename="gensim.png"):
    if words == None:
        if sample > 0:
            words = np.random.choice(list(model.vocab.keys()), sample)
        else:
            words = [ word for word in model.vocab ]
        
    word_vectors = np.array([model[w] for w in words])

    twodim = PCA().fit_transform(word_vectors)[:,:2]
    
    plt.figure(figsize=(6,6))
    plt.scatter(twodim[:,0], twodim[:,1], edgecolors='k', c='r')
    for word, (x,y) in zip(words, twodim):
        plt.text(x+0.05, y+0.05, word)
    
    plt.savefig(filename)

In [None]:
display_pca_scatterplot(model, 
                        ["great", "cool", "brilliant", "wonderful", "well", "amazing",
                        "worth", "sweet", "enjoyable", "boring", "bad", "dumb",
                        "annoying", "female", "male", "queen", "king", "man", "woman", "rain", "snow",
                        "hail", "coffee", "tea"])

In [None]:
# Define your own set of words to see how they are represented
your_words = []

display_pca_scatterplot(model, your_words, filename="yourwords.png")