# Enhanced LSTM exercise
```In this exercise you will implement the article "Enhanced LSTM for Natural Language Inference" with the objective to achieve the performances of the article's experiments. The main goals of this exercise:```
1. ```Experiment with low-level programming of DNN, specifically RNN and LSTM architectures.```
2. ```To get familiar with the "Attention mechanism" in nueral networks```
3. ```Learn some good practice about the use of neural networks in the NLP domain.```


```The purpose of this exercise is to acquire skills of independent work, and implementation of articles by your own.
You need to implement everything by yourself, and this notebook is only an abstract, non-mandatory guide for your work.
Please keep in mind that you are required to implement the preprocessing of the data, the models, and all the pipeline by yourself. Therefore, it is a great opportunity to practice "non-notebook" programming, and ordered and efficient code design.```

```Note: No kaggle cheats! this means no computing features by mixing the train and the test. Think creatively how to handle train-test differences (very relevant to NLP and Embedding layers)
This is a research process. You are going to come along with unknown topics, theoretical gaps, and implementation challenges.
Please document all the above challenges in the format: what is the challenge? how you are going to solve it? conclusions from the solving process. You will discuss all the above with your tutor. (The documentation will be yours to keep and use for help in the future)```

```~Gilad Royz & Gal Eyal```

## Read the article

```Just read the article, and try to understand it completely.
While reading, Discuss with your tutor about questions that comes up.```

```Tip: try to explain the article to someone not familiar with it. It will help you realize how well you understood it.```

## The data

In [87]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from tqdm import tqdm
import joblib
import pickle
import os
from pathlib import Path

```You can find some information about the dataset and the task in the README.txt file (found in the data folder: "snli_1.0")
Here is code that reads the data:```

In [73]:
DATA_PATH = r"C:\Users\VEREDDAS\Documents\Tomer\resources\LSTM"
DATA_PATH

'C:\\Users\\VEREDDAS\\Documents\\Tomer\\resources\\LSTM'

In [65]:
train = pd.read_csv(DATA_PATH+'/snli_1.0/snli_1.0_train.txt', sep='\t')
dev = pd.read_csv(DATA_PATH+'/snli_1.0/snli_1.0_dev.txt', sep='\t')
test = pd.read_csv(DATA_PATH+'/snli_1.0/snli_1.0_test.txt', sep='\t')

In [67]:
train = train[train['gold_label'] != '-']
dev = dev[dev['gold_label'] != '-']

In [68]:
y_train = train['gold_label'].map({'entailment' : 0, 'neutral' : 1, 'contradiction' : 2})
y_dev = dev['gold_label'].map({'entailment' : 0, 'neutral' : 1, 'contradiction' : 2})

In [69]:
train.head()

Unnamed: 0,gold_label,sentence1_binary_parse,sentence2_binary_parse,sentence1_parse,sentence2_parse,sentence1,sentence2,captionID,pairID,label1,label2,label3,label4,label5
0,neutral,( ( ( A person ) ( on ( a horse ) ) ) ( ( jump...,( ( A person ) ( ( is ( ( training ( his horse...,(ROOT (S (NP (NP (DT A) (NN person)) (PP (IN o...,(ROOT (S (NP (DT A) (NN person)) (VP (VBZ is) ...,A person on a horse jumps over a broken down a...,A person is training his horse for a competition.,3416050480.jpg#4,3416050480.jpg#4r1n,neutral,,,,
1,contradiction,( ( ( A person ) ( on ( a horse ) ) ) ( ( jump...,( ( A person ) ( ( ( ( is ( at ( a diner ) ) )...,(ROOT (S (NP (NP (DT A) (NN person)) (PP (IN o...,(ROOT (S (NP (DT A) (NN person)) (VP (VBZ is) ...,A person on a horse jumps over a broken down a...,"A person is at a diner, ordering an omelette.",3416050480.jpg#4,3416050480.jpg#4r1c,contradiction,,,,
2,entailment,( ( ( A person ) ( on ( a horse ) ) ) ( ( jump...,"( ( A person ) ( ( ( ( is outdoors ) , ) ( on ...",(ROOT (S (NP (NP (DT A) (NN person)) (PP (IN o...,(ROOT (S (NP (DT A) (NN person)) (VP (VBZ is) ...,A person on a horse jumps over a broken down a...,"A person is outdoors, on a horse.",3416050480.jpg#4,3416050480.jpg#4r1e,entailment,,,,
3,neutral,( Children ( ( ( smiling and ) waving ) ( at c...,( They ( are ( smiling ( at ( their parents ) ...,(ROOT (NP (S (NP (NNP Children)) (VP (VBG smil...,(ROOT (S (NP (PRP They)) (VP (VBP are) (VP (VB...,Children smiling and waving at camera,They are smiling at their parents,2267923837.jpg#2,2267923837.jpg#2r1n,neutral,,,,
4,entailment,( Children ( ( ( smiling and ) waving ) ( at c...,( There ( ( are children ) present ) ),(ROOT (NP (S (NP (NNP Children)) (VP (VBG smil...,(ROOT (S (NP (EX There)) (VP (VBP are) (NP (NN...,Children smiling and waving at camera,There are children present,2267923837.jpg#2,2267923837.jpg#2r1e,entailment,,,,


In [70]:
y_train.head()

0    1
1    2
2    0
3    1
4    0
Name: gold_label, dtype: int64

## Embedding

```In the article, they used word embeddings of "GloVe". You are very encouraged to read about it.
Question: How is it different from word2vec? (answer here)```

```Here is a code that load the embedding```

In [89]:
words = []
embeddings = []
with open(DATA_PATH+'/glove.840B.300d.txt', 'r', encoding='utf-8') as f:
    for line in tqdm(f):
        parsed_line = line.replace('\n', '').split(' ', 1)
        word = parsed_line[0]
        vector = np.fromstring(parsed_line[1], sep=' ')

        words.append(word)
        embeddings.append(vector)

2196018it [02:19, 15798.44it/s]


In [95]:
embeddings_dict = {}
with open(DATA_PATH+'/glove.840B.300d.txt', 'r', encoding='utf-8') as f:
    for line in tqdm(f):
        parsed_line = line.replace('\n', '').split(' ', 1)
        word = parsed_line[0]
        vector = np.fromstring(parsed_line[1], sep=' ')

        embeddings_dict[word] = vector

2196018it [02:36, 14076.53it/s]


In [102]:
embeddings_dict[";"]

array([ 1.8183e-01,  3.8337e-01,  2.3520e-01, -6.3050e-01,  4.0701e-01,
       -9.5615e-02,  5.5491e-02,  1.6528e-02, -4.0059e-01,  1.9465e+00,
       -2.1942e-01,  2.9755e-01, -1.0451e-01, -1.8876e-01, -1.2594e-01,
       -1.9805e-02, -2.8192e-01,  1.5226e+00, -4.5423e-01,  1.9973e-01,
        4.7333e-02,  2.4086e-01, -1.1775e-01,  3.0111e-01,  1.8821e-01,
       -7.7166e-03,  1.0744e-01,  1.9411e-01,  5.0651e-01, -7.8274e-02,
        3.0363e-02, -5.9477e-01, -4.4038e-01, -8.6185e-02,  4.3342e-01,
       -4.7462e-01,  6.6133e-02,  3.2683e-01,  2.6660e-01,  4.0830e-01,
       -3.5178e-01,  1.2051e-01,  3.6540e-01, -1.1928e-01, -5.4352e-02,
       -1.2229e-01, -6.0345e-01,  1.4667e-01,  2.9509e-01, -7.2388e-02,
        4.5672e-02,  1.5538e-01,  1.3553e-01, -9.5090e-02,  6.5017e-02,
        1.9001e-01,  2.6136e-01,  4.5543e-01, -5.5811e-01, -2.5156e-01,
        7.5236e-02,  9.5047e-03,  5.9743e-01,  3.9508e-01, -4.1390e-01,
       -5.2223e-03,  3.2668e-01, -3.1639e-01, -4.2511e-02,  4.22

In [94]:
print(words[0:40])
print(embeddings[0:40])

[',', '.', 'the', 'and', 'to', 'of', 'a', 'in', '"', ':', 'is', 'for', 'I', ')', '(', 'that', '-', 'on', 'you', 'with', "'s", 'it', 'The', 'are', 'by', 'at', 'be', 'this', 'as', 'from', 'was', 'have', 'or', '...', 'your', 'not', '!', '?', 'will', 'an']
[array([-0.082752 ,  0.67204  , -0.14987  , -0.064983 ,  0.056491 ,
        0.40228  ,  0.0027747, -0.3311   , -0.30691  ,  2.0817   ,
        0.031819 ,  0.013643 ,  0.30265  ,  0.0071297, -0.5819   ,
       -0.2774   , -0.062254 ,  1.1451   , -0.24232  ,  0.1235   ,
       -0.12243  ,  0.33152  , -0.006162 , -0.30541  , -0.13057  ,
       -0.054601 ,  0.037083 , -0.070552 ,  0.5893   , -0.30385  ,
        0.2898   , -0.14653  , -0.27052  ,  0.37161  ,  0.32031  ,
       -0.29125  ,  0.0052483, -0.13212  , -0.052736 ,  0.087349 ,
       -0.26668  , -0.16897  ,  0.015162 , -0.0083746, -0.14871  ,
        0.23413  , -0.20719  , -0.091386 ,  0.40075  , -0.17223  ,
        0.18145  ,  0.37586  , -0.28682  ,  0.37289  , -0.16185  ,
        0

## Implementation

```You are going to implement the models using the platform of "dynet".
It has properties that might help implementing the models from the article.
The biggest advantage is it's ability to dynamically change the network computation graph in runtime, what makes it possible to build a different graph of the network for every new sentence. (cool ahh?)```

```If you want to work with other platform like pytorch, tensorflow, or even keras (if you feel lucky, punk), get your tutor approval and suggestions.```

<div class="alert alert-block alert-warning">
<h1> Important NOTE: please write the <b>final</b> implementation in .py file and not in notebooks. This is not negotiable! </h1>
</div>

In [None]:
import dynet
### documentation in: https://dynet.readthedocs.io/en/latest/

### ESIM

```After you read the article and you know what is the ESIM model, you are going to implement it.
You are expected to implement it from zero. (with dynet its not the end of the world)```

### Tree ESIM

```If you like adventures, you can implement the Tree-lstm yourself, but there is a git that contain a good structure that can be useful for the implementation here (you will need to change and adjust it! no free meals)```

```the git address:``` ```https://github.com/clab/dynet/tree/master/examples/treelstm```

```You are also given a parser for the sentences tree expression in the dataset (it is an adjusted version of the "tree.py" in the git). You are very welcome to use it (will save you some time).```

```Here is a demonstration:```

In [None]:
from tree import Tree

In [None]:
santance = train['sentence1_binary_parse'][0]
santance

In [None]:
tree_example = Tree.from_sexpr(santance, '<NODE>')

In [None]:
for i in tree_example.leaves():
    print(i)

In [None]:
for i in tree_example.nonterms():
    print(i)

```The format is: [root [left] [right]], and for father of sole leaf: [root leaf]```

### Hybrid Inference Model (HIM)

```Explained perfectly in the article```

## Some questions to think about

```1. How to treat words that are not part of training data in test time ?```

```2. How to deal with rare words?```

```3. What are the input state and input of the inner nodes in the lstm-tree?```

# Good luck!!