# Language Modeling

In [None]:
from __future__ import print_function
import tensorflow as tf
from tensorflow.python.keras.models import Sequential, load_model
from tensorflow.python.keras.layers import Dense, Activation
from tensorflow.python.keras.layers import LSTM
from tensorflow.python.keras.layers import TimeDistributed, Dropout, Embedding
from tensorflow.python.keras.optimizers import RMSprop
from tensorflow.python.keras.utils import get_file
import numpy as np
import random
import sys

## TASK 1: Preprocessing Text
### Loading corpus

Pick text corpora you would like to work on.

To chose: pantadeusz, potop, linux, nietzsche

In [None]:
# TODO: choose a file
path = '../rsc/_.txt'
text = open(path).read().lower()
print('corpus length:', len(text))


In [None]:
chars = sorted(list(set(text)))
print('total chars:', len(chars))
char_indices = dict((c, i) for i, c in enumerate(chars))
indices_char = dict((i, c) for i, c in enumerate(chars))

print(char_indices)

### Select data

In the chosen corpora we need to define model input - sentences and target - next chracter.

In [None]:
maxlen = 40
step = 3
sentences = []
next_chars = []
for i in range(0, len(text) - maxlen, step):
    # TODO: select input and target data of the model
    sentences.append( _ )
    next_chars.append( _ )
print('number of sequences:', len(sentences))

In the result your lists should look like (depending on the text):

``` python
print(sentences[100:105])
['i obmywał mu twarz; chwilami zatrzymywał',
 'bmywał mu twarz; chwilami zatrzymywał si',
 'wał mu twarz; chwilami zatrzymywał się d',
 ' mu twarz; chwilami zatrzymywał się dla ',
 ' twarz; chwilami zatrzymywał się dla poc']
 
print(next_chars[100:105])
[' ', 'ę', 'l', 'p', 'z']

```

### Text vectorization

Try to represent the text as one hot vectors. 


In [None]:
X = np.zeros((len(sentences), maxlen, len(chars)), dtype=np.bool)
y = np.zeros((len(sentences), len(chars)), dtype=np.bool)

# TODO: intialize one-hot vectors
for i, sentence in enumerate(sentences):
    for t, char in enumerate(sentence):
        X[ _ ]  = _
    y[ _ ] = _

In the result you should have:

``` python
print(X[100,...])
[[False False False ..., False False False]
 [False  True False ..., False False False]
 [False False False ..., False False False]
 ..., 
 [False False False ..., False False False]
 [False False False ..., False False False]
 [False False False ..., False False False]]
 
print(y[100,...])
[False  True False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False]


```

## TASK 2: Building NN Model

Build Neural Network with Keras.

### First RNN model

Simple RNN model architecture is provided below. To compile the model you need to fill blank spaces first.

  * Hint 1: check in the presentation.
  * Hint 2:  <span style="color:white"> check the imports in the first block. </span>

Backpropagation (RMSprop) is used to minimize loss (crossentropy).

In [None]:
## TODO: fill blank spaces
model = Sequential()
model.add( _ (128, input_shape=(maxlen, len(chars))))
model.add(Dense(len(chars)))
model.add(Activation(" _ "))

optimizer = RMSprop(lr=0.01)
model.compile(loss='categorical_crossentropy', optimizer=optimizer)

### Training function and prediction

In the function below the model is trained. 

We use Long Short-Term Memory for recurention. Then probability of the next letter is computed by softmax function:

\begin{equation}
P\left(y = j | x ; W\right) = \frac{\exp(x ^T w_j)}{\sum_{k=1}^K \exp(x^T w_k) }
\end{equation}

Where $W$ elements are trainable. 

During prediction we choose:

\begin{equation}
\hat{y} = argmax_j P\left(y = j | x ; W\right) 
\end{equation}

In the code below you need add next letter selection.

In [None]:
def run(model, iterations):
    for iteration in range(1, iterations):
        print('-' * 50)
        print('Iteration', iteration+1)
        model.fit(X, y, batch_size=128, epochs=1)

        start_index = random.randint(0, len(text) - maxlen - 1)

        print()

        generated = ''
        sentence = text[start_index: start_index + maxlen]
        generated += sentence
        print('----- Generating with seed: "' + sentence + '"')
        sys.stdout.write(generated)

        for i in range(400):
            x = np.zeros((1, maxlen, len(chars)))
            for t, char in enumerate(sentence):
                x[0, t, char_indices[char]] = 1.

            preds = model.predict(x, verbose=0)[0]
            preds = np.asarray(preds).astype('float64')
            
            # TODO: How to select next leteter based on preds
            next_char = _

            generated += next_char
            sentence = sentence[1:] + next_char

            sys.stdout.write(next_char)
            sys.stdout.flush()
        print()

Now you can run training. 

In [None]:
run(model,20)

## TASK 3: Improve predictions

Picking the most likely letter may not be the best solution in text generation (see above).

One solution is sampling next letter from softmax distribution:

\begin{equation}
\hat{y} \sim P\left(y | x ; W\right) 
\end{equation}

Please add random sampling in the function **sample_letters**

  * Hint: <span style="color:white"> Use np.random.multinomial(1, probabilies, 1). </span>

  
  
### Temperature sampling


Softmax distribution may be changed. 

In temperature sampling we chose additional parameter $\tau$ (temperature, diversity):

\begin{equation}
\tilde{P_t}\left(y = j | x ; W\right) = \frac{\exp(\frac{x ^T w_j}{\tau})}{\sum_{k=1}^K \exp(\frac{x^T w_k}{\tau}) }
\end{equation}

In general, the $\tau$ is lower ($\tau < 1$), the more confident we are about our picks. If it's high the genrated text will be more diverse, but we can make more mistakes.

Please add random temperature sampling in the function **sample_letters**

In [None]:
def sample_letter(preds, temperature=1.0):
    
    #TODO: 1. transform output distibution
    
    #TODO: 2. sample from distribution
    
    return next_char

def run_improved(model, iterations):
    for iteration in range(1, iterations+1):
        print()
        print('-' * 50)
        print('Iteration', iteration)
        model.fit(X, y, batch_size=128, epochs=1, validation_split=0.2)

        start_index = random.randint(0, len(text) - maxlen - 1)

        for diversity in [0.2, 0.5, 1.0, 1.2]:
            print()
            print('----- diversity:', diversity)

            generated = ''
            sentence = text[start_index: start_index + maxlen]
            generated += sentence
            print('----- Generating with seed: "' + sentence + '"')
            sys.stdout.write(generated)

            for i in range(400):
                x = np.zeros((1, maxlen, len(chars)))
                for t, char in enumerate(sentence):
                    x[0, t, char_indices[char]] = 1.

                preds = model.predict(x, verbose=0)[0]
                preds = np.asarray(preds).astype('float64')
                next_char = sample_letter(preds, diversity)

                generated += next_char
                sentence = sentence[1:] + next_char

                sys.stdout.write(next_char)
                sys.stdout.flush()
            print()

Now you can run training.


(It may take some time to see preditctions after training, becasuse validation is added.)

In [None]:
run_improved(model,20)

## TASK 4: Improve the model

We will make some changes to improve our model.

### Embeddings

We can use additional layer to transform sparse one-hot vector to dense vectors - embeddings (see presentation).

Please add layer to transform one-hot input into embeddings.

  * Hint: <span style="color:white"> Use TimeDistributed Dense layer </span>

### Batch Normalization

We can normalize input to hidden layers in a batch

  https://keras.io/layers/normalization/#batchnormalization


   

Please add batch normalization to your network.

In [None]:
model_improved = Sequential()

# TODO: Add embedding layer 

model_improved.add( _ (128))
# TODO: Add batch normalization

model_improved.add(Dense(len(chars)))
model_improved.add(Activation( "_" ))

optimizer = RMSprop(lr=0.01, decay=10e-5)
model_improved.compile(loss='categorical_crossentropy', optimizer=optimizer, metrics=["acc"])

In [None]:
run_improved(model_improved,30)

### Questions

1. What is the differnce between TimeDistibuted Dense and Embedding layers?

2. Why should we use validation?

3. How does dropout influence training and validation?

### Experiments

There are many ways to improve your model:

1. Add more dense layers.

2. Experiment with batch normalization and dropout.

3. Try out other activation functions e.g. relu (keep softmax as output activation):

  https://keras.io/activations/

4. Use different or more recurent layers:

  https://keras.io/layers/recurrent/
    
5. Experiment with other optimizers (or/and learning rate):
  
  https://keras.io/optimizers/
  


### Save your model!

When you are excited by your cool new model, you can save it.




In [None]:
model_improved.save('../models/my_LM.h5')

## Pretrained model

If you have problems with training and would like to experiment with predictions, you can load pretrained model.

You need to work with 'potop.txt' for this model.

In [None]:
model = load_model('../models/pretrained_LM.h5')
model_improved =  load_model('../models/pretrained_LM.h5')