## Fundamentals of Deep Learning for Natural language Processing
Hyungon Ryu, Sr. Solutions Architect hryu@nvidia.com 


NLP is all about creating system that process language in order to perform certain tasks including 
- Answering question from Assistant such as Apple Siri, Amazon Alexa,  Microsoft Cortana and Samsung Bixby
- Sentimenta analysis from news papers or any text determining whether a sentence or phrase has a positive or negative.
- Image captioning  by gathering information from image and generate sentence similar as Microsoft COCO dataset image annotations. 
- Gathering  the information from text and generate summarizing sentence is also good example of NLP 
- Machine Translation is also huge domain translating a paragraph of test to another language and there are other many NLP areas .  


For NLP, we make word vector to represent word in n dimension vectors. below is one example. 

```
NVIDIA    = [00010000] 
Deep      = [10100000]
Learning  = [10001000]
```
There are many  word embedding technique and look at the word2vec algorithm. 

With large input corpus would produce a vector space and each unique word in the corpus being assigned a corresponding vector in the space. After training, word vectors are positioned in the vector space sharing common contexts in the corpus are located in close to one another in the space.  

Moreover, Word2Vector was the appearance of linear relationship between different word vectors.  the word vectors seemed to capture different grammatical and semantic concepts. For example, we could represent vector operation such as Queen is King minus man  plus woman. 


In [1]:
!nvidia-smi

Wed Sep 19 23:46:21 2018       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.111                Driver Version: 384.111                   |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|   0  Tesla K80           Off  | 00000000:00:04.0 Off |                    0 |
| N/A   69C    P8    29W / 149W |      0MiB / 11439MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|  No ru

In [0]:
'''
this is critical option. TF memory control allow_growth is mandatory option
'''
import tensorflow as tf
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.Session(config=config )

In [3]:
!nvidia-smi   | grep MiB

| N/A   70C    P0    69W / 149W |    115MiB / 11439MiB |      0%      Default |


In [0]:
''' from keras example 
Example script to generate text from Nietzsche's writings.

At least 20 epochs are required before the generated text
starts sounding coherent.

It is recommended to run this script on GPU, as recurrent
networks are quite computationally intensive.

If you try this script on new data, make sure your corpus
has at least ~100k characters. ~1M is better.
'''

from __future__ import print_function
import numpy as np
import random
import sys
import io


In [7]:
%%time 
from keras.utils import get_file
path = get_file('nietzsche.txt',     origin='https://s3.amazonaws.com/text-datasets/nietzsche.txt')
#path = get_file('trump-twittter.txt', origin='https://raw.githubusercontent.com/yhgon/SMWU_DL/master/trump-twittter.txt')
!head -n 2 {path} && tail -n 2 {path}

Downloading data from https://s3.amazonaws.com/text-datasets/nietzsche.txt
PREFACE

Buddhists as essential to sanctity, just as they were denounced by the
christian world as the indications of sinfulness.CPU times: user 47.8 ms, sys: 26.9 ms, total: 74.7 ms
Wall time: 2.3 s


In [8]:
%%time 

with io.open(path, encoding='utf-8') as f:
    text = f.read().lower()
print('corpus length:', len(text))

chars = sorted(list(set(text)))
print('total chars:', len(chars))
char_indices = dict((c, i) for i, c in enumerate(chars))
indices_char = dict((i, c) for i, c in enumerate(chars))

# cut the text in semi-redundant sequences of maxlen characters
maxlen = 40
step = 3
sentences = []
next_chars = []
for i in range(0, len(text) - maxlen, step):
    sentences.append(text[i: i + maxlen])
    next_chars.append(text[i + maxlen])
print('nb sequences:', len(sentences))

print('Vectorization...')
x = np.zeros((len(sentences), maxlen, len(chars)), dtype=np.bool)
y = np.zeros((len(sentences), len(chars)), dtype=np.bool)
for i, sentence in enumerate(sentences):
    for t, char in enumerate(sentence):
        x[i, t, char_indices[char]] = 1
    y[i, char_indices[next_chars[i]]] = 1




corpus length: 600893
total chars: 57
nb sequences: 200285
Vectorization...
CPU times: user 3.05 s, sys: 258 ms, total: 3.3 s
Wall time: 3.31 s


In [0]:
from keras import backend as K
from keras.models import Sequential
from keras.layers import Dense, LSTM, Activation, Dropout, Flatten, BatchNormalization, Embedding
from keras.objectives import MSE, MAE
from keras.optimizers import RMSprop, adam
from keras.callbacks import LambdaCallback, EarlyStopping

K.set_session(K.tf.Session(config=config))

K.clear_session()


In [13]:
# build the model: a single LSTM
K.clear_session()

print('Build model...')
model = Sequential()
model.add( LSTM( 64 , input_shape=(maxlen, len(chars))  , return_sequences=True, activation='tanh') )
model.add( Dropout(0.10))
model.add( BatchNormalization( ) )
model.add( LSTM( 64 , activation='relu' ) )
model.add( Dropout(0.10))
model.add( BatchNormalization( ) )

model.add( Dense(256 , activation='relu') )
model.add( Dense(len(chars), activation='softmax') )

model.summary()

optimizer = RMSprop(lr=0.01)
model.compile(loss='categorical_crossentropy', optimizer=optimizer)



Build model...
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
lstm_1 (LSTM)                (None, 40, 64)            31232     
_________________________________________________________________
dropout_1 (Dropout)          (None, 40, 64)            0         
_________________________________________________________________
batch_normalization_1 (Batch (None, 40, 64)            256       
_________________________________________________________________
lstm_2 (LSTM)                (None, 64)                33024     
_________________________________________________________________
dropout_2 (Dropout)          (None, 64)                0         
_________________________________________________________________
batch_normalization_2 (Batch (None, 64)                256       
_________________________________________________________________
dropout_3 (Dropout)          (None, 64)                0     

In [0]:

def sample(preds, temperature=1.0):
    # helper function to sample an index from a probability array
    preds = np.asarray(preds).astype('float64')
    preds = np.log(preds) / temperature
    exp_preds = np.exp(preds)
    preds = exp_preds / np.sum(exp_preds)
    probas = np.random.multinomial(1, preds, 1)
    return np.argmax(probas)


def on_epoch_end(epoch, _):
    # Function invoked at end of each epoch. Prints generated text.
    print()
    print('----- Generating text after Epoch: %d' % epoch)

    start_index = random.randint(0, len(text) - maxlen - 1)
    for diversity in [0.2, 0.5, 1.0, 1.2]:
        print('----- diversity:', diversity)

        generated = ''
        sentence = text[start_index: start_index + maxlen]
        generated += sentence
        print('----- Generating with seed: "' + sentence + '"')
        sys.stdout.write(generated)

        for i in range(80):
            x_pred = np.zeros((1, maxlen, len(chars)))
            for t, char in enumerate(sentence):
                x_pred[0, t, char_indices[char]] = 1.

            preds = model.predict(x_pred, verbose=0)[0]
            next_index = sample(preds, diversity)
            next_char = indices_char[next_index]

            generated += next_char
            sentence = sentence[1:] + next_char

            sys.stdout.write(next_char)
            sys.stdout.flush()
        print()



In [0]:
print_callback = LambdaCallback(on_epoch_end=on_epoch_end)

model.fit(x, y,
          batch_size=128,
          epochs=100,
          callbacks=[print_callback])