# ELMO : Deep contextualized word representations

Pre-trained contextual representations of words from large scale bidirectional languate models provides large improvements over GloVe / word2ved baselines.

### Application includes:
* question answering
* co-reference 
* semantic role labeling 
* classification 
* syntcatic parsing

### Reference 
* [Deep contextualized word representations](http://www.aclweb.org/anthology/N18-1202)
* [AllenNLP ELMo section](https://allennlp.org/elmo)
    

## Contextual representation

The elmo command will write all the BiLM individual layer representations for a dataset of senteneces to an HDF5 file


In [1]:
!allennlp elmo sentences.txt elmo_layers.hdf5 --all

2019-02-07 12:43:52,738 - INFO - allennlp.common.registrable - instantiating registered subclass relu of <class 'allennlp.nn.activations.Activation'>
2019-02-07 12:43:52,738 - INFO - allennlp.common.registrable - instantiating registered subclass relu of <class 'allennlp.nn.activations.Activation'>
2019-02-07 12:43:52,741 - INFO - allennlp.common.registrable - instantiating registered subclass relu of <class 'allennlp.nn.activations.Activation'>
2019-02-07 12:43:52,743 - INFO - allennlp.common.registrable - instantiating registered subclass relu of <class 'allennlp.nn.activations.Activation'>
2019-02-07 12:43:52,808 - INFO - allennlp.commands.elmo - Initializing ELMo.
2019-02-07 12:44:07,110 - INFO - allennlp.commands.elmo - Processing sentences.
2it [00:00,  4.77it/s]


## Load the contextual representation

In [9]:
import h5py
h5py_file = h5py.File("elmo_layers.hdf5",'r')
embedding = h5py_file.get("0")
print("Total number of layers : ", len(embedding))
print("Total number of words in the first sentence : ", 
      len(embedding[0]))
print("Dimension of representation : " , len(embedding[0][0]))

Total number of layers :  3
Total number of words in the first sentence :  16
Dimension of representation :  1024


## Using ELMo as a PyTorch Module for fine-tuning

#### allennlp.modules.elmo.Elmo class
   Allows to compute weighted ELMo representations as PyTorch Tensor
   
   ![](elmo_eq.png)


In [20]:
from allennlp.modules.elmo import Elmo, batch_to_ids

options_file = "https://s3-us-west-2.amazonaws.com/allennlp/models/elmo/2x4096_512_2048cnn_2xhighway/elmo_2x4096_512_2048cnn_2xhighway_options.json"
weight_file = "https://s3-us-west-2.amazonaws.com/allennlp/models/elmo/2x4096_512_2048cnn_2xhighway/elmo_2x4096_512_2048cnn_2xhighway_weights.hdf5"

# Compute 2 different representation for each token.
elmo = Elmo(options_file, weight_file, 2, dropout=0)

# Convert sentences to character ids
sentences = [['First','sentence', '.'], ['Another','.']]
character_ids = batch_to_ids(sentences)

embeddings = elmo(character_ids)


In [42]:
embedding_dim = embeddings['elmo_representations'][0].size()
print("The output embedding of size \n", 
      "\n - batch_size={}".format(embedding_dim[0]), 
      "\n - sequence_length={}".format(embedding_dim[1]), 
      "\n - ELMo vector size={}".format(embedding_dim[2]))

The output embedding of size 
 
 - batch_size=2 
 - sequence_length=3 
 - ELMo vector size=1024


## Using ELMo interactively

#### allennlp.commands.elmo.ElmoEmbedder
   - Provides easy way to process sentences with ELMo using Jupyter.
   - 1st layer --> context insensitive token representation
   - 2nd, 3rd  --> LSTM layers 

In [43]:
from allennlp.commands.elmo import ElmoEmbedder
elmo = ElmoEmbedder()

In [44]:
tokens = ["I", "ate", "an", "apple", "for", "breakfast"]
vectors = elmo.embed_sentence(tokens)

assert(len(vectors) == 3) # 3 layers, 1 for each layer
assert(len(vectors[0] == len(tokens))) # each word return a vector

In [45]:
import scipy
vectors2 = elmo.embed_sentence(["I","ate","carrot", "for", "breakfact"])
scipy.spatial.distance.cosine(vectors[2][3],vectors2[2][3])

0.6675608456134796

## ELMo as existing allennlp models

#### Let's see how to add ELMo to existing model 
  - Adding single layer --> only configuration chage
  - Adding two or more layers --> need to chage the code