In [1]:
import keras
keras.__version__

Using TensorFlow backend.


'2.1.2'

# Text generation with LSTM

This notebook contains the code samples found in Chapter 8, Section 1 of [Deep Learning with Python](https://www.manning.com/books/deep-learning-with-python?a_aid=keras&a_bid=76564dff). Note that the original text features far more content, in particular further explanations and figures: in this notebook, you will only find source code and related comments.

----

[...]

## Implementing character-level LSTM text generation


Let's put these ideas in practice in a Keras implementation. The first thing we need is a lot of text data that we can use to learn a 
language model. You could use any sufficiently large text file or set of text files -- Wikipedia, the Lord of the Rings, etc. In this 
example we will use some of the writings of Nietzsche, the late-19th century German philosopher (translated to English). The language model 
we will learn will thus be specifically a model of Nietzsche's writing style and topics of choice, rather than a more generic model of the 
English language.

## Preparing the data

Let's start by downloading the corpus and converting it to lowercase:

In [3]:
import keras
import numpy as np
import io

path = keras.utils.get_file(
    'nietzsche.txt',
    origin='https://s3.amazonaws.com/text-datasets/nietzsche.txt')
text = io.open(path, encoding='utf-8').read().lower()
print('Corpus length:', len(text))

Corpus length: 600893



Next, we will extract partially-overlapping sequences of length `maxlen`, one-hot encode them and pack them in a 3D Numpy array `x` of 
shape `(sequences, maxlen, unique_characters)`. Simultaneously, we prepare a array `y` containing the corresponding targets: the one-hot 
encoded characters that come right after each extracted sequence.

In [4]:
# Length of extracted character sequences
maxlen = 60

# We sample a new sequence every `step` characters
step = 3

# This holds our extracted sequences
sentences = []

# This holds the targets (the follow-up characters)
next_chars = []

for i in range(0, len(text) - maxlen, step):
    sentences.append(text[i: i + maxlen])
    next_chars.append(text[i + maxlen])
print('Number of sequences:', len(sentences))

# List of unique characters in the corpus
chars = sorted(list(set(text)))
print('Unique characters:', len(chars))
# Dictionary mapping unique characters to their index in `chars`
char_indices = dict((char, chars.index(char)) for char in chars)

# Next, one-hot encode the characters into binary arrays.
print('Vectorization...')
x = np.zeros((len(sentences), maxlen, len(chars)), dtype=np.bool)
y = np.zeros((len(sentences), len(chars)), dtype=np.bool)
for i, sentence in enumerate(sentences):
    for t, char in enumerate(sentence):
        x[i, t, char_indices[char]] = 1
    y[i, char_indices[next_chars[i]]] = 1

Number of sequences: 200278
Unique characters: 57
Vectorization...


## Building the network

Our network is a single `LSTM` layer followed by a `Dense` classifier and softmax over all possible characters. But let us note that 
recurrent neural networks are not the only way to do sequence data generation; 1D convnets also have proven extremely successful at it in 
recent times.

In [5]:
from keras import layers

model = keras.models.Sequential()
model.add(layers.LSTM(128, input_shape=(maxlen, len(chars))))
model.add(layers.Dense(len(chars), activation='softmax'))

Instructions for updating:
keep_dims is deprecated, use keepdims instead


Since our targets are one-hot encoded, we will use `categorical_crossentropy` as the loss to train the model:

In [6]:
optimizer = keras.optimizers.RMSprop(lr=0.01)
model.compile(loss='categorical_crossentropy', optimizer=optimizer)

Instructions for updating:
keep_dims is deprecated, use keepdims instead


## Training the language model and sampling from it


Given a trained model and a seed text snippet, we generate new text by repeatedly:

* 1) Drawing from the model a probability distribution over the next character given the text available so far
* 2) Reweighting the distribution to a certain "temperature"
* 3) Sampling the next character at random according to the reweighted distribution
* 4) Adding the new character at the end of the available text

This is the code we use to reweight the original probability distribution coming out of the model, 
and draw a character index from it (the "sampling function"):

In [7]:
def sample(preds, temperature=1.0):
    preds = np.asarray(preds).astype('float64')
    preds = np.log(preds) / temperature
    exp_preds = np.exp(preds)
    preds = exp_preds / np.sum(exp_preds)
    probas = np.random.multinomial(1, preds, 1)
    return np.argmax(probas)


Finally, this is the loop where we repeatedly train and generated text. We start generating text using a range of different temperatures 
after every epoch. This allows us to see how the generated text evolves as the model starts converging, as well as the impact of 
temperature in the sampling strategy.

In [8]:
import random
import sys

for epoch in range(1, 60):
    print('epoch', epoch)
    # Fit the model for 1 epoch on the available training data
    model.fit(x, y,
              batch_size=128,
              epochs=1)

    # Select a text seed at random
    start_index = random.randint(0, len(text) - maxlen - 1)
    generated_text = text[start_index: start_index + maxlen]
    print('--- Generating with seed: "' + generated_text + '"')

    for temperature in [0.2, 0.5, 1.0, 1.2]:
        print('------ temperature:', temperature)
        sys.stdout.write(generated_text)

        # We generate 400 characters
        for i in range(400):
            sampled = np.zeros((1, maxlen, len(chars)))
            for t, char in enumerate(generated_text):
                sampled[0, t, char_indices[char]] = 1.

            preds = model.predict(sampled, verbose=0)[0]
            next_index = sample(preds, temperature)
            next_char = chars[next_index]

            generated_text += next_char
            generated_text = generated_text[1:]

            sys.stdout.write(next_char)
            sys.stdout.flush()
        print()

epoch 1
Epoch 1/1
--- Generating with seed: " very
grateful, very patient, very complaisant--but with all"
------ temperature: 0.2
 very
grateful, very patient, very complaisant--but with all the greating of the self and the are the struch and the self the fand of the most and the derious one a man to be the sensical and the intimed to the sensiness of the postion of the beansity of the spincing of a self the concentions of the man the self the greating of the sensing of the good of the sensing and fore the most of the present of the concentions of the self and souls of the sense the 
------ temperature: 0.5
t of the concentions of the self and souls of the sense the finds the spirntnt of his istence and something in the condine and noter to the geners of the spection of a predection of the sundires even to concerding of explanical like the dors and perhingly been that which be thing of the conside to intinetion of the inting and strutdest and the self the strutges to himsestional to t

detoor and developed, it is
there are,
conscience, in forgent "him sharrel xutism, forms
that in a "educte termity thusary intentiom through the disposines to, pation.


13m
slew dont enperisherword, with the p
------ temperature: 1.2
sines to, pation.


13m
slew dont enperisherword, with the paiss attempice?
"my or into fiel scatiblen, they. the marty or, and thinkep" be horic doess, it is creatures avare, and endours: in curious vill"--which setromentmentative friend of teacher it stuct
ongise the oldown to
stregrive "kweps
from a may
should their powerful: much i
enish! 
wor
one men
drend?

the cu cause wartherm, spe
soututice, axists in the germanyd, we himself whole a co undermere 
epoch 9
Epoch 1/1
--- Generating with seed: "time he really sees himself--and what surprises in the proce"
------ temperature: 0.2
time he really sees himself--and what surprises in the process of the problem the same the states of the conscience of the spirit and in the conscience that the same the spi

than skepticism, the mild, pleasing, lulling poppy, the sense of the self-contrious self-gregators and self-greatest and self-gregatomed to the same the same the same the sense of the sense of the stronges and the same the same the sense of the self-contradint the soul and the sense of the stronges the same and the sense of the same the same and the self the self-greatest and the self-positive and and superficial and false in the same and the sam
------ temperature: 0.5
sitive and and superficial and false in the same and the same freed overture, there is something that the value and comparison to periline of the discoveres and domaring in the superficial and oblige the selfish, in the sayes of welly and involuntarily be
many schopenhauerantly and something which and something with the under the german false or as the world be are the freedom of the bearnation of the partially proour in the the whole being thing the taken
------ temperature: 1.0
 the partially proour in the the whole b

  This is separate from the ipykernel package so we can avoid doing imports until


--di regard as
no does towardh-methos that all pride alsay adminights,
is every keep someffue; on his bending far to comman despriofictely ablingem child sympathy, not once they contil pitced--the o"ck of a
certainty
what everything
undirrational
ins
epoch 19
Epoch 1/1
--- Generating with seed: "hind every cave in him there is not, and must
necessarily be"
------ temperature: 0.2
hind every cave in him there is not, and must
necessarily be a desire of the sense of the genting and some proper and an active the sense of the most and the sense of the present the men and more are self-can deceptic of the great the sense of the present men as the sense of the present in the sense of the sense of the sense of the disconced and some spirit of the sense of the sense of the sense of the sense of the free spirits in the most serious and as t
------ temperature: 0.5
f the sense of the free spirits in the most serious and as the same desire as he is a deception in our adoped which and of the manif

e considerity, so the virtues of subjection of the german words, however, which
the devised: underroulty that action of prohapid man is troford. on
these simpon: the bad to the more shamine
fear there an externation by huster mutus our undrocatienful feck! watchs and hard, he is musiclasm, well won hadmary?
sbelime percetutes a condition of yet
impulsc with its romantic
kind, the sapre." it he 
smmbchest, and them delusion. to farcay to oneself
that hiddif
------ temperature: 1.2
mbchest, and them delusion. to farcay to oneself
that hiddifoms
hard belief still creptial idea the quallise merely granctions--out that know
ywating liva reuppidhty,
wles whether alne ari mpmbst egestivor
itself to old productime!
they concevenle. for at owrrals
in them, to that subteht"
seked ts soulfbquoleness.
to below and, animal all religion is rigain digne! as a wine in
while that origeration? witd firsties,
uncer,ary know-rule, etayned rogmord, evi
epoch 27
Epoch 1/1
--- Generating with seed: "ed actua

     say, why should we not, flung at ease neath this all the strong and the strengther and distrust of the extent the present and the states of the strong and sense of the strong and good operation of the condition of the strengther and present the strong and all the present of the sense of the strong and immentive of the present and the standard the consideration of the standard the strong and the strong and all the present the sense of the spirit 
------ temperature: 0.5
 and the strong and all the present the sense of the spirit and from the comparrable into the world a hard individualst the destroys in the strengther will in the world and who are conscious its bellow privilege of the community and something is conscience, and the such personality. in the sense of artistical species and sense and self prover as a new every conscience of mankind is a consideration, and from the conversation and pricious, what go has a peri
------ temperature: 1.0
, and from the conversation and pric

1eehn prevailers action, years to the pimpter is faith even to be sacarve how that why tewtle any deels alone will
himself which
ooken
blood events far(vgs before huither
"free his harm of highested
dominated trrjurings. a dirinte to be new, and cunica burn
we suddenlations if show-countered, our schopenhautian
suitual in danger!

! xvosing.

19p not the outhermence of the
forpared degreem entaution. unto da
epoch 42
Epoch 1/1
--- Generating with seed: "d finally i wish people to put the good amulet,
"gai saber" "
------ temperature: 0.2
d finally i wish people to put the good amulet,
"gai saber" and superficial to a desire and superior and the condition of the supere and also a more in the sensuality of the spirit of the spirit of the world of the world of the sense of the supere and also of the contradict that is the stronger and the condition of the most experience, and the superior and the super and access of the superior and the super of the sentence of the same the condition of t

spirituality that it is at once man of the intellectious and personal evilly dishome and distinction and consequently be disease it is a problem of the most themselves of the freedom of an explanation of the consequently be hold of the consequent states t
------ temperature: 1.0
ation of the consequently be hold of the consequent states that it dreamstomion without nocvant goo-, so than does not one the will no but alas,
from it
is their most god, of the more own time the lovinem, however, the instincy, however, one proce of the lough a domerity, their time,
artonsts--everywhase
reed, echave a psychological spreffqudent, weak which has that in the spected with the is"
a
flight
the time which has finded some faous deftnessity to ou
------ temperature: 1.2
light
the time which has finded some faous deftnessity to our wi made an suffering.

1iing, timide rirdest the houblate us genmincy and dispupiokic gool. -the greal-curposabieken of the charmsulating-contest with the righstqons, let us

afterever with orded thus grand, andnad mermalrs, indictt, the
effect of them betraying without two spirit, the
armpieteded,
oftening must
gener
epoch 57
Epoch 1/1
--- Generating with seed: "ightly expressed
from the soul of a proud viking. such a typ"
------ temperature: 0.2
ightly expressed
from the soul of a proud viking. such a type has a means of the strength of the spirit, and the state of the comparation of the most sense of the storing of the still superficial and something which is not always and in the sense of the sense of the spirit and man should be advises of the strength of the strength and the world of the strength of the strength and more souls of the world and in the man in the sense of the spirit, and somethi
------ temperature: 0.5
world and in the man in the sense of the spirit, and something means its conscience of man, and that it is are tempostion. the individual. the health of the smility of the very soul, as the artists, and something may be are sought and in 


As you can see, a low temperature results in extremely repetitive and predictable text, but where local structure is highly realistic: in 
particular, all words (a word being a local pattern of characters) are real English words. With higher temperatures, the generated text 
becomes more interesting, surprising, even creative; it may sometimes invent completely new words that sound somewhat plausible (such as 
"eterned" or "troveration"). With a high temperature, the local structure starts breaking down and most words look like semi-random strings 
of characters. Without a doubt, here 0.5 is the most interesting temperature for text generation in this specific setup. Always experiment 
with multiple sampling strategies! A clever balance between learned structure and randomness is what makes generation interesting.

Note that by training a bigger model, longer, on more data, you can achieve generated samples that will look much more coherent and 
realistic than ours. But of course, don't expect to ever generate any meaningful text, other than by random chance: all we are doing is 
sampling data from a statistical model of which characters come after which characters. Language is a communication channel, and there is 
a distinction between what communications are about, and the statistical structure of the messages in which communications are encoded. To 
evidence this distinction, here is a thought experiment: what if human language did a better job at compressing communications, much like 
our computers do with most of our digital communications? Then language would be no less meaningful, yet it would lack any intrinsic 
statistical structure, thus making it impossible to learn a language model like we just did.


## Take aways

* We can generate discrete sequence data by training a model to predict the next tokens(s) given previous tokens.
* In the case of text, such a model is called a "language model" and could be based on either words or characters.
* Sampling the next token requires balance between adhering to what the model judges likely, and introducing randomness.
* One way to handle this is the notion of _softmax temperature_. Always experiment with different temperatures to find the "right" one.