<a href="https://colab.research.google.com/github/vKredGod/keras/blob/master/Text_generation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [0]:
from __future__ import print_function

from keras.callbacks import LambdaCallback
from keras.models import Sequential
from keras.layers import Dense, Activation
from keras.layers import LSTM
from keras.optimizers import RMSprop

import numpy as np
import random
import sys
import io

Using TensorFlow backend.


The train file will be a collection of Sherlock Holmes. This will teach the algorithm how to write <br><br>
Opening the file and reading the text length (in characters)

In [5]:
text = open('train_text.txt', 'r').read().lower()
print('text length', len(text))

text length 561835


Printing the first 300 characters of it.

In [6]:
print(text[:300])

﻿adventure i. a scandal in bohemia

i.

to sherlock holmes she is always the woman. i have seldom heard
him mention her under any other name. in his eyes she eclipses
and predominates the whole of her sex. it was not that he felt
any emotion akin to love for irene adler. all emotions, and that
one p


Transforming letters into integers (For the algorithm to learn)

In [7]:
chars = sorted(list(set(text)))
print('total chars: ', len(chars))

char_indices = dict((c, i) for i, c in enumerate(chars))
indices_char = dict((i, c) for i, c in enumerate(chars))

total chars:  56


Here I will split the text into "subsequences". This means that I will use things such as "\n" or double quotes to make the algorithm understand where it needs to create a new line or end a phrase.

In [8]:
maxlen = 40
step = 3
sentences = []
next_chars = []

for i in range(0, len(text) - maxlen, step):
  
    sentences.append(text[i: i + maxlen])
    next_chars.append(text[i + maxlen])
    
print('nb sequences:', len(sentences))

nb sequences: 187265


In [9]:
print(sentences[:3])
print(next_chars[:3])

['\ufeffadventure i. a scandal in bohemia\n\ni.\n\n', 'venture i. a scandal in bohemia\n\ni.\n\nto ', 'ture i. a scandal in bohemia\n\ni.\n\nto she']
['t', 's', 'r']


In [0]:
x = np.zeros((len(sentences), maxlen, len(chars)), dtype=np.bool)
y = np.zeros((len(sentences), len(chars)), dtype=np.bool)

for i, sentence in enumerate(sentences):
  
    for t, char in enumerate(sentence):
      
        x[i, t, char_indices[char]] = 1
    y[i, char_indices[next_chars[i]]] = 1

In [11]:
print(x[:3])
print(y[:3])

[[[False False False ... False False  True]
  [False False False ... False False False]
  [False False False ... False False False]
  ...
  [False False False ... False False False]
  [ True False False ... False False False]
  [ True False False ... False False False]]

 [[False False False ... False False False]
  [False False False ... False False False]
  [False False False ... False False False]
  ...
  [False False False ... False False False]
  [False False False ... False False False]
  [False  True False ... False False False]]

 [[False False False ... False False False]
  [False False False ... False False False]
  [False False False ... False False False]
  ...
  [False False False ... False False False]
  [False False False ... False False False]
  [False False False ... False False False]]]
[[False False False False False False False False False False False False
  False False False False False False False False False False False False
  False False False False False Fals

# Creating the model

This is the "brain", or the neural structure of the algorithm. It is the part that processes the text and produces everything.

In [0]:
model = Sequential()
model.add(LSTM(128, input_shape=(maxlen, len(chars))))
model.add(Dense(len(chars)))
model.add(Activation('softmax'))

In [0]:
optimizer = RMSprop(lr=0.01)

model.compile(loss='categorical_crossentropy', optimizer=optimizer)

**These are functions that will help me feed the text to the algorithm. These were taken from the official Keras python github repository. https://github.com/keras-team/keras/blob/master/examples/lstm_text_generation.py **

In [0]:
def sample(preds, temperature=1.0):
    # helper function to sample an index from a probability array
    preds = np.asarray(preds).astype('float64')
    preds = np.log(preds) / temperature
    exp_preds = np.exp(preds)
    preds = exp_preds / np.sum(exp_preds)
    probas = np.random.multinomial(1, preds, 1)
    return np.argmax(probas)

In [0]:
def on_epoch_end(epoch, logs):
    # Function invoked at end of each epoch. Prints generated text.
    print()
    print('----- Generating text after Epoch: %d' % epoch)

    start_index = random.randint(0, len(text) - maxlen - 1)
    for diversity in [0.2, 0.5, 1.0, 1.2]:
        print('----- diversity:', diversity)

        generated = ''
        sentence = text[start_index: start_index + maxlen]
        generated += sentence
        print('----- Generating with seed: "' + sentence + '"')
        sys.stdout.write(generated)

        for i in range(400):
            x_pred = np.zeros((1, maxlen, len(chars)))
            for t, char in enumerate(sentence):
                x_pred[0, t, char_indices[char]] = 1.

            preds = model.predict(x_pred, verbose=0)[0]
            next_index = sample(preds, diversity)
            next_char = indices_char[next_index]

            generated += next_char
            sentence = sentence[1:] + next_char

            sys.stdout.write(next_char)
            sys.stdout.flush()
        print()
print_callback = LambdaCallback(on_epoch_end=on_epoch_end)

Saving the neuron values into a file.

In [0]:
from keras.callbacks import ModelCheckpoint

filepath = "weights.hdf5"

checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1, save_best_only=True, mode='min')

In [0]:
from keras.callbacks import ReduceLROnPlateau

reduce_lr = ReduceLROnPlateau(monitor='loss', factor=0.2, patience=1, min_lr=0.001)

In [0]:
callbacks = [print_callback, checkpoint, reduce_lr]

# **Training the model.**

 The algorithm ends here. This code snippet will start the learning process of the algorithm. It takes a good ammount of time. While it is training, we can see the progress, which it very cool. <br><br>
 Once done, The program will call the following code when done: print_callback, checkpoint and reduce_lr, that were previously defined. This will save the neuron and the model (brain) structure.


In [22]:
model.fit(x, y, batch_size=128, epochs=5, callbacks=callbacks)

Instructions for updating:
Use tf.cast instead.
Epoch 1/5

----- Generating text after Epoch: 0
----- diversity: 0.2
----- Generating with seed: "out five ft. seven in. in height;
strong"
out five ft. seven in. in height;
strong the strance of the man which the strange and and the strance which the man which had seemed to the stranced to the fact and the charted and the man when he sall the lanced the man and the strong had seemed the man which the past the street of the man and and house the spall the street to the strance the past the stranced to the forser the strance which had seemed the strance to the street and the
----- diversity: 0.5
----- Generating with seed: "out five ft. seven in. in height;
strong"
out five ft. seven in. in height;
strong had the factle, and to saig the stranked the recappens the that in a getter. the machancely with a
contice. i the recaltion. i can with a shermous stanth whom he dades, me have but had real and sight the strance which have ress of the wan

<keras.callbacks.History at 0x7f69121ec4a8>

# Generating text

Now that the algorithm is trained, it's time to test it and generate some text!

In [0]:
def generate_text(length, diversity):

    start_index = random.randint(0, len(text) - maxlen - 1)
    generated = ''
    sentence = text[start_index: start_index + maxlen]
    generated += sentence
    
    for i in range(length):
      
            x_pred = np.zeros((1, maxlen, len(chars)))
        
            for t, char in enumerate(sentence):
                x_pred[0, t, char_indices[char]] = 1.

            preds = model.predict(x_pred, verbose=0)[0]
            next_index = sample(preds, diversity)
            next_char = indices_char[next_index]

            generated += next_char
            sentence = sentence[1:] + next_char
            
    return generated

In [25]:
print(generate_text(500, 0.2))

rrible affair up."

"you heard nothing your be a strange the conclus and some one of the man which he shall be a surpless the man which he shall be a street of the case and some of the man which i should be the conclus an instant of the man which he had been such the strange and the conclus the man which have been the man who was a street and her the man which he should be a strange in the case and the man who was a strange of the man which he had been an and came to me the conclus an one of the matter which he strucked to me, and the
