# [Visualizing ELMo Contextual Vectors](https://towardsdatascience.com/visualizing-elmo-contextual-vectors-94168768fdaa)

> 让我们尝试使用ELMo模型生成上下文向量，并使用PCA将向量投影到2D空间进行可视化。 在ELMo论文中，有3层字嵌入，第0层是基于字符的上下文无关层，后面是两个Bi-LSTM层。 作者凭经验证明，从第一个Bi-LSTM层生成的单词向量可以更好地捕获语法，第二层可以更好地捕获语义。 我们将为具有多种x形态的 - bank，work和plant的三个单词可视化第1层和第2层上下文向量。

> Let’s try using the ELMo model to generate contextual vectors and using PCA to project the vectors to a 2D space for visualization. In the ELMo paper, there are 3 layers of word embedding, layer zero is the character-based context independent layer, followed by two Bi-LSTM layers. The authors have empirically shown that the word vectors generated from the first Bi-LSTM layer can better capture the syntax, and the second layer can capture the semantics better. We will visualize both layer 1 and layer 2 contextual vectors for three words that have multiple senses— bank, work, and plant.

In [1]:
from sklearn.decomposition import PCA
import matplotlib.pyplot as plt
import numpy as np
from collections import OrderedDict

In [2]:
from allennlp.modules.elmo import batch_to_ids
from allennlp.commands.elmo import ElmoEmbedder

#dowload options_file and weight_file files form https://allennlp.org/elmo
options_file = "/home/b418/jupyter_workspace/yuanxiao/elmo_data/elmo_2x4096_512_2048cnn_2xhighway_5.5B_options.json"
weight_file = "/home/b418/jupyter_workspace/yuanxiao/elmo_data/elmo_2x4096_512_2048cnn_2xhighway_5.5B_weights.hdf5"

elmo = ElmoEmbedder(options_file, weight_file) 

In [3]:
class Elmo:
    def __init__(self, options_file, weight_file):
        self.elmo = ElmoEmbedder(options_file, weight_file)

    def get_elmo_vector(self, tokens, layer):
        vectors = self.elmo.embed_sentence(tokens)
        X = []
        for vector in vectors[layer]:
            X.append(vector)

        X = np.array(X)

        return X

In [4]:
def dim_reduction(X, n):
    pca = PCA(n_components=n)
    results = pca.fit_transform(X)
    print("size of reduced X: {}".format(results.shape))

    for i, ratio in enumerate(pca.explained_variance_ratio_):
        print("Variance retained ratio of PCA-{}: {}".format(i+1, ratio))

    return results

In [5]:
def plot(word, token_list, reduced_X, file_name, title):
    fig, ax = plt.subplots()

    # plot ELMo vectors
    i = 0
    for j, token in enumerate(token_list):
        color = pick_color(j)
        for _, w in enumerate(token):

            # only plot the word of interest
            if w.lower() in [word, word + 's', word + 'ing', word + 'ed']:
                ax.plot(reduced_X[i, 0], reduced_X[i, 1], color)
            i += 1

    tokens = []
    for token in token_list:
        tokens += token

    # annotate point
    k = 0
    for i, token in enumerate(tokens):
        if token.lower() in [word, word + 's', word + 'ing', word + 'ed']:
            text = ' '.join(token_list[k])

            # bold the word of interest in the sentence
            text = text.replace(token, r"$\bf{" + token + "}$")

            plt.annotate(text, xy=(reduced_X[i, 0], reduced_X[i, 1]))
            k += 1

    ax.set_title(title)
    ax.set_xlabel("PCA 1")
    ax.set_ylabel("PCA 2")
    fig.savefig(file_name, bbox_inches="tight")

    print("{} saved\n".format(file_name))


def pick_color(i):
    if i == 0:
        color = 'ro'
    elif i == 1:
        color = 'bo'
    elif i == 2:
        color = 'yo'
    elif i == 3:
        color = 'go'
    else:
        color = 'co'
    return color


In [6]:
model = Elmo(options_file, weight_file)

banks = OrderedDict()
banks[0] = "One can deposit money at the bank"
banks[1] = "He had a nice walk along the river bank"
banks[2] = "I withdrew cash from the bank"
banks[3] = "The river bank was not clean"
banks[4] = "My wife and I have a joint bank account"

works = OrderedDict()
works[0] = "I like this beautiful work by Andy Warhol"
works[1] = "Employee works hard every day"
works[2] = "My sister works at Starbucks"
works[3] = "This amazing work was done in the early nineteenth century"
works[4] = "Hundreds of people work in this building"

plants = OrderedDict()
plants[0] = "The gardener planted some trees in my yard"
plants[1] = "I plan to plant a Joshua tree tomorrow"
plants[2] = "My sister planted a seed and hopes it will grow to a tree"
plants[3] = "This kind of plant only grows in the subtropical region"
plants[4] = "Most of the plants will die without water"

words = {
    "bank": banks,
    "work": works,
    "plant": plants
}

# contextual vectors for ELMo layer 1 and 2
for layer in [1, 2]:
    for word, sentences in words.items():
        print("visualizing word {} using ELMo layer {}".format(word, layer))
        X = np.concatenate([model.get_elmo_vector(tokens=sentences[idx].split(),
                                                  layer=layer)
                            for idx, _ in enumerate(sentences)], axis=0)

        # The first 2 principal components
        X_reduce = dim_reduction(X=X, n=2)

        token_list = []
        for _, sentence in sentences.items():
            token_list.append(sentence.split())

        file_name = "{}_elmo_layer_{}.png".format(word, layer)
        title = "Layer {} ELMo vectors of the word {}".format(layer, word)
        plot(word, token_list, X_reduce, file_name, title)


visualizing word bank using ELMo layer 1


  index_range = sequence_lengths.new_tensor(torch.arange(0, len(sequence_lengths)))


size of reduced X: (37, 2)
Variance retained ratio of PCA-1: 0.21842965483665466
Variance retained ratio of PCA-2: 0.07475615292787552


<Figure size 640x480 with 1 Axes>

bank_elmo_layer_1.png saved

visualizing word work using ELMo layer 1


  index_range = sequence_lengths.new_tensor(torch.arange(0, len(sequence_lengths)))


size of reduced X: (35, 2)
Variance retained ratio of PCA-1: 0.0917154923081398
Variance retained ratio of PCA-2: 0.08404156565666199


<Figure size 640x480 with 1 Axes>

work_elmo_layer_1.png saved

visualizing word plant using ELMo layer 1
size of reduced X: (47, 2)
Variance retained ratio of PCA-1: 0.09630362689495087
Variance retained ratio of PCA-2: 0.06646716594696045


<Figure size 640x480 with 1 Axes>

plant_elmo_layer_1.png saved

visualizing word bank using ELMo layer 2
size of reduced X: (37, 2)
Variance retained ratio of PCA-1: 0.12367216497659683
Variance retained ratio of PCA-2: 0.09037613123655319


<Figure size 640x480 with 1 Axes>

bank_elmo_layer_2.png saved

visualizing word work using ELMo layer 2
size of reduced X: (35, 2)
Variance retained ratio of PCA-1: 0.09765800833702087
Variance retained ratio of PCA-2: 0.07844914495944977


<Figure size 640x480 with 1 Axes>

work_elmo_layer_2.png saved

visualizing word plant using ELMo layer 2
size of reduced X: (47, 2)
Variance retained ratio of PCA-1: 0.09669843316078186
Variance retained ratio of PCA-2: 0.08065589517354965


<Figure size 640x480 with 1 Axes>

plant_elmo_layer_2.png saved



## bank visualizing 

+ "One can deposit money at the bank"
+ "He had a nice walk along the river bank"
+ "I withdrew cash from the bank"
+ "The river bank was not clean"
+ "My wife and I have a joint bank account"

![](bank_elmo_layer_1.png)

![](bank_elmo_layer_2.png)

## work visualizing 

+ "I like this beautiful work by Andy Warhol"
+ "Employee works hard every day"
+ "My sister works at Starbucks"
+ "This amazing work was done in the early nineteenth century"
+ "Hundreds of people work in this building"

![](work_elmo_layer_1.png)

![](work_elmo_layer_2.png)

## plant visualizing 

+ "The gardener planted some trees in my yard"
+ "I plan to plant a Joshua tree tomorrow"
+ "My sister planted a seed and hopes it will grow to a tree"
+ "This kind of plant only grows in the subtropical region"
+ "Most of the plants will die without water"

![](plant_elmo_layer_1.png)

![](plant_elmo_layer_2.png)

## Conclusion

+ By using contextual vectors, a word has different word vectors depending on different contexts, and the words with the same sense will be close to each other in vector space! Contextual word vectors with the same sense will form a cluster. We can then calculate the cluster centroid of each sense, and use a simple 1-nearest neighbor approach to disambiguate word sense in a sentence.
+ ELMo layer 2 word vectors with same sense form clear cluster and the distance between cluster centroid is larger than layer 1.