# Text generation with LSTM

## Implementing character-level LSTM generation

In this example we will use some of the writings of Nietzsche, the late-19th century German philosopher (translated to English). The language model we will learn will thus be specifically a model of Nietzsche's writing style and topics of choice, rather than a more generic model of the English language.

## Preprocess data

Let's download the corpus and converting it to lowercase:

In [3]:
import numpy as np

from tensorflow import keras

path = keras.utils.get_file(
    'nietzsche.txt',
    origin='https://s3.amazonaws.com/text-datasets/nietzsche.txt')
text = open(path).read().lower()

print('Corpus length:', len(text))

Downloading data from https://s3.amazonaws.com/text-datasets/nietzsche.txt
Corpus length: 600893


Next, we will extract partially-overlapping sequences of length `maxlen`, one-hot encode them and pack them in a 3D Numpy array `x` of shape `(sequences, maxlen, unique_characters)`. Simultaneously, we prepare a array `y` containing the corresponding targets: the one-hot encoded characters that come right after each extracted sequence.

In [5]:
# Length of extracted character sequences
maxlen = 60

# We sample a new sequence every `step` characters
step = 3

# This holds our extracted sequences
sentences = []

# This holds the targets (the follow-up characters)
next_chars = []

for i in range(0, len(text) - maxlen, step):
    sentences.append(text[i: i + maxlen])
    next_chars.append(text[i + maxlen])
print('Number of sequences:', len(sentences))

# List of unique characters in the corpus
chars = sorted(list(set(text)))

print('Unique characters:', len(chars))

# Dictionary mapping unique characters to their index in `chars`
char_indices = dict((char, chars.index(char)) for char in chars)

# Next, one-hot encode the characters into binary arrays.
print('Vectorization...')
x = np.zeros((len(sentences), maxlen, len(chars)), dtype=np.bool)
y = np.zeros((len(sentences), len(chars)), dtype=np.bool)
for i, sentence in enumerate(sentences):
    for t, char in enumerate(sentence):
        x[i, t, char_indices[char]] = 1
    y[i, char_indices[next_chars[i]]] = 1
print('...done')

Number of sequences: 200278
Unique characters: 57
Vectorization...
...done


## Building the network

The network is a single `LSTM` layer followed by a `Dense` classifier and _softmax_ over all possible characters. But let us note that recurrent neural networks are not the only way to do sequence data generation; 1D convnets also have proven extremely successful at it in recent times.

Since the targets are one-hot encoded, we will use `categorical_crossentropy` as the loss to train the model.

In [7]:
from tensorflow.keras import layers

model = keras.models.Sequential()
model.add(layers.LSTM(128, input_shape=(maxlen, len(chars))))
model.add(layers.Dense(len(chars), activation='softmax'))

optimizer = keras.optimizers.RMSprop(lr=0.01)
model.compile(loss='categorical_crossentropy', optimizer=optimizer)

model.summary()

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
lstm_1 (LSTM)                (None, 128)               95232     
_________________________________________________________________
dense_1 (Dense)              (None, 57)                7353      
Total params: 102,585
Trainable params: 102,585
Non-trainable params: 0
_________________________________________________________________


## Training the language model and sampling from it

Given a trained model and a seed text snippet, we generate new text by repeatedly:

1) Drawing from the model a probability distribution over the next character given the text available so far;
2) Reweighting the distribution to a certain "temperature";
3) Sampling the next character at random according to the reweighted distribution;
4) Adding the new character at the end of the available text;

This is the code we use to reweight the original probability distribution coming out of the model, and draw a character index from it (the "sampling function"):

In [8]:
def sample(preds, temperature=1.0):
    preds = np.asarray(preds).astype('float64')
    preds = np.log(preds) / temperature
    exp_preds = np.exp(preds)
    preds = exp_preds / np.sum(exp_preds)
    probas = np.random.multinomial(1, preds, 1)
    
    return np.argmax(probas)

Finally, this is the loop where we repeatedly train and generated text. We start generating text using a range of different temperatures after every epoch. This allows us to see how the generated text evolves as the model starts converging, as well as the impact of temperature in the sampling strategy.

In [9]:
import random
import sys

NUM_EPOCHS = 60
CHAR_GENERATED_TEXT = 400 # We generate 400 characters

for epoch in range(1, NUM_EPOCHS):
    print('epoch', epoch)
    
    # Fit the model for 1 epoch on the available training data
    model.fit(x, y, batch_size=128, epochs=1)

    # Select a text seed at random
    start_index = random.randint(0, len(text) - maxlen - 1)
    generated_text = text[start_index: start_index + maxlen]
    print(f"--- Generating with seed: \"{generated_text}\"")

    for temperature in [0.2, 0.5, 1.0, 1.2]:
        print(f"------ temperature: {temperature}")
        sys.stdout.write(generated_text)

        for i in range(CHAR_GENERATED_TEXT):
            sampled = np.zeros((1, maxlen, len(chars)))
            for t, char in enumerate(generated_text):
                sampled[0, t, char_indices[char]] = 1.

            preds = model.predict(sampled, verbose=0)[0]
            next_index = sample(preds, temperature)
            next_char = chars[next_index]

            generated_text += next_char
            generated_text = generated_text[1:]

            sys.stdout.write(next_char)
            sys.stdout.flush()
        print()

epoch 1
Train on 200278 samples
--- Generating with seed: "ilosophers, any
more than "pleasure self prepared" (sympathy"
------ temperature: 0.2
ilosophers, any
more than "pleasure self prepared" (sympathy in the self--the man we prease and the man for the sense the self--the sense and the man of the man the sense the for the man the man of the self--the some to the confection of the case of the master of the sense of the sense to the man for the sense of the for the man the self--the man we in the soul the some to the man of the stain of the self--the may be all the self--the self--the contrary of
------ temperature: 0.5
the self--the may be all the self--the self--the contrary of the respect preystenity of the sain confectious and pertation of the mores the self--though perhaps who can for only for the for the
conficting and the one sheow in the be more a the man only of the fact to our of the once the onations of the batter upon it is once to the all the some have something be and f

natulred the
ethicl him,
are a turavel. as gederaked through pare drorungces, petil, fath is pleasure and orgablence, it is how
areapticism.g=ic to varies on genylo; and morelve of
a paliss these phino ome well; even efforctis fo? in wemel usmer in brysy enceftited
conscience, he madiled have puracte; which an our inwontero effect,
no usaith in the that dorwably would it miy 
epoch 5
Train on 200278 samples
--- Generating with seed: "g to the
sort of people placed under its spell and protectio"
------ temperature: 0.2
g to the
sort of people placed under its spell and protection of the problem of the strange of the the strangest to the believed the the all the states of the assifice the states of the in the morality of the strong the the strong the morality and the stateshed the soul--the the soul the destructive the morality of the strong the masters of the morality of the states of the one must be man an explained to the strong the states of the for the strong the act
------ temperat

absolutonct orijang meliugh, as being scienticible benefuth, and subject man everythine": the further, hidducquinely; has a deteriotim,
than be, they adducts of taste. the
mothes speak of tendanning is to be
asfuldo-on
a democratice themselves methodas" befowe pride hunder's
------ temperature: 1.2
-on
a democratice themselves methodas" befowe pride hunder's
philosophy--ostrions
wnot can sfeefion, (whather of
habit and dragd, with blessiporatic
srigula which is
no ramition, permlitelory
it, please. knowls ywhy is. in
hudagice and decishistess that io phoskychules;" impressing god to period of our ambice akin
to thiegs our dade--at the rehigaous himself generain in spritvisgs
slood in oblical huad youn barbign inscine
vous (noound, to which in such
ju
epoch 9
Train on 200278 samples
--- Generating with seed: "he ascribes the success, the carrying out of the willing,
to"
------ temperature: 0.2
he ascribes the success, the carrying out of the willing,
to the same one and the contrary and

 of the same and the present of the exertict of the spirit of the same part of the "condition and degree of the false of the spiritually at all still will regard to the feeling of its existence the discleary and the deference to experienced with a standpoint of the strength be the cave of the subtle under the will to lives, the conditing to "passion of the presence of the present, is from the present and degree of the preferse
of the ideas: they made a new
------ temperature: 1.0
ent and degree of the preferse
of the ideas: they made a new cording to the reflaments: they willing youth, of a toble will lose; for the let such we passions;--but extraberden countching the senied of the brood, why develops in ptorder
there are, is all very clove
limited necessarious
always
delicate, as it is not wishests, which yitage, in societ,
ullectity,
of its full domain and has prehard usel, in many praised for
fregerander were any called time a w
------ temperature: 1.2
el, in many praised for
freger

necessary!--whoever examines the conscience of the same the sense of the artist of the strength of the element of the desire of the greatest of the greatest of the spirit of the spirit of the same be the desire of the spirit of the element of the sense of the most person of the most spirit of the contradicts of the self of the desire of the sense of the interparent and self element of the same to the desire of the estimat and stronger to th
------ temperature: 0.5
 of the same to the desire of the estimat and stronger to the same contradicts a stronger the christian to be a massed to the sense of religious experience and belief in the self storice of the interesting of the greates of the spirit and greek it is not the more himself--in the frements of the man, to a soul of the good one may gives a stronger with the most all the desire and common of the assict subject and every one has every present some our bear all 
------ temperature: 1.0
t subject and every one has every present some

who taking pinisomm-admitation of fortundsporaten--ruled thothing
venterstagaton
iinality one marking; at where but for the matterlgely ifor, wo besient, also thirst thrred his, becomeal";
or, 
epoch 20
Train on 200278 samples
--- Generating with seed: "sp more delicately; which scents the hidden and forgotten
tr"
------ temperature: 0.2
sp more delicately; which scents the hidden and forgotten
transforgnhsandes and morality of the problem of the experience, and the same man is always and the more are all the formerly of the same order of the same to a soul of the same the more of the same and probably and all the same many and the same and the servilence of the same and the existence of the subject between the same and the problem of the spirit of the same and soul of the same and the mor
------ temperature: 0.5
m of the spirit of the same and soul of the same and the moral in the struggle but the entire probable, is always also all the greatest man in the endeal strift, not all the f

  This is separate from the ipykernel package so we can avoid doing imports until


 the subject of the standards of the stind of the superficial soul of the superficial sense of the same that it is an acts of the subject of the sense of the stronger, and the contrary being the stronger, and the subject of an extrains of the subject and the master. when they will to the same that it is also the same to the contempon the sense of 
------ temperature: 0.5
same that it is also the same to the contempon the sense of the general nature, in the sense of the truth and faith of mankind, which
would not be same something with the below of an age and product of a condition of which it is the such an original something with the only time of the same that is also in the believed of signification the most person of such an indeed the virtues of the superficial, interrence with an exception of conscience to its who sim
------ temperature: 1.0
l, interrence with an exception of conscience to its who simple for which enfar as morally..  one
may gregarous, perhaps as words" whether wo

sokes a scievn of desc me; we have, lost!




machy whoup
epoch 27
Train on 200278 samples
--- Generating with seed: "elp. the worst
is that he seems incapable of communicating h"
------ temperature: 0.2
elp. the worst
is that he seems incapable of communicating his properation of the strong of the strength originate the most decided and the proper of a profound of the strength for the strength of the stronger, and discovered and the strength in the stronger of the strong of the strength originate and the most something the intellectual and the stronger of the strength originate the strong of the proper of the order of the strength originate and the strong
------ temperature: 0.5
proper of the order of the strength originate and the stronger of the disveals of valuestic and deceded of strong and he problem and the scaventable souls" in the strict he another in the most decipbed tentative of the world of the strength will single of the have fain to be a geture, as the experience. the se

 antaction. these that a and mora-conscious cultured," as day.

11 theregrwale difficulty a felt, chain will standprie, parth some
ableat inner domation. in
; remative:
just them very development of enlightenificatee.=--when this
words and moral modest.--he 
preciately the simmendature.


same.ive
gavelyrough every name or esty,--in 'deriske--religir exceocal prejudicem
they formation and zaw acplidd, now, athaining. view what
peoploanly in duty,
one of se
epoch 31
Train on 200278 samples
--- Generating with seed: "le and eternal type upon the stage as kundry,
type vecu, and"
------ temperature: 0.2
le and eternal type upon the stage as kundry,
type vecu, and the condition of the same the state of the state of the soul. the moral soul, as a soul, and the stronger, the condition of the present this soul, and in the same the science of the science of the states of the strong and the states of the science of the conditions of the state of the science of the condition of the science of the

things? in anysagey.

2a in realive us does not all its comiting the estimated epissed in so passion aliugh to kanizing proousion and drefled externing taint,
"guines.toftic
quite living to the sated enws and our neked,
"the forestioss from an
oduong, tough woman, we has that they invent offortall from much as does a mind ow, cave--yples with spiritual with complet 
------ temperature: 1.2
 as does a mind ow, cave--yples with spiritual with complet is
the exenses pocted of morally ascenision, whencerthing
imando, therefore a
thingly, umcentles), and are only and impuloks indian pourauting it way in it is to cesteping.

212] as it healthinks which the word is he was the scriffer, charmual, lawarity of thosen thisiquentobito torchedr, supposing." the eduquoral more strange which a long frubles:
spirituallyce searpss
could upon in inject--by th
epoch 35
Train on 200278 samples
--- Generating with seed: "ause we are so thoroughly trained to it through the
intermin"
------ temperature: 0.2


stronger courghes of the sense, it was a subjective the and mone of the most be the truth, it is an extent of the fact the super of the
sense of the same and deceptions of the most philosopher and realing perpetually a spirit for the danger, and shate opmossious, more differ
------ temperature: 1.0
y a spirit for the danger, and shate opmossious, more different, the sin to him of pererong and heaven and peral the super belief is the strengtt, laughing question. every case with our fell and philosophers, as we dangers, perhaps undisinged as mithad their "works of to ethicality to "find courgh now beaver, audes to be chriatters
for the position cous: he is to
colve a significance of said-anche and decagarant artions from that new?--a smal of unjurgmens
------ temperature: 1.2
 and decagarant artions from that new?--a smal of unjurgmens, bethover
vicons in my "tendement! slave. exception: "yilty evences makes what good are questions--taltery as advact, refined, can natt,
even without reli

excess, for which, owing to good reasons, it is used and the superstition of the subjects of the soul of the same soul who is a soul of the strength and soul with the strength and the strength is there are something to a soul of the same something to the sentiment of the same something the really the sentiment of the strength the same the sentiment of the same something to the same to the moral and superflst of the same superior of the sentiment of t
------ temperature: 0.5
ral and superflst of the same superior of the sentiment of the rich
struggle to the signarity of the heart of the sign of the soul of his own suffering, it is thereby in the historical convention of the popular of the story, and and religious and toul the great distredic of the free spirits who all the invention of the deal men, and
such an our profound and man of the subject and inasmuchua
portation of the antithese of the fundamental and democration of t
------ temperature: 1.0
ion of the antithese of the fundamen

loves age of whom ones
what he very being, catter: ir difner is more who cals": as e"withy reading distivis, that he laid that "mssaverg, of naures.


12nnereked rule.

 
epoch 46
Train on 200278 samples
--- Generating with seed: ", changed? and what
     i am, to you my friends, now am i n"
------ temperature: 0.2
, changed? and what
     i am, to you my friends, now am i no longer and moral such a consequently and formed and soul is the same so called the spirit of the soul of the strength of the strength of the same specialty of the substiance of the same desires of the same desire to the sense of the sense of the strength and personal individual the same such a man and desire to be something and been the strength of the strength of the same such an according to t
------ temperature: 0.5
 strength of the strength of the same such an according to the master of mankind, and the instincts of the visit of strength of the smard and free by the best of a bars of the word
of man and strong

instinct and matter of all ceteding dangers to have typeed it, where all the same times is tygais musicic power, kind with itself because one spirits sometimes from an as man to me whan the same other thougniess withd explinge sense beforeingly their ultimated to kforer, under undens to a spacholity, perhaps with these own political predompiat
it, on may so has been r
------ temperature: 1.2
with these own political predompiat
it, on may so has been rank 
one the -rulity and sensity.and-show.

101. ih to be undomnates to their bestravieful pernained fright, to betawy up.mfostly cruelys; we knowledge, theaver exparain primitiveled but in the olde as is preceptable as betrayn,
means old as , thus another educate,
however wishes to
occaudve his reatitions latt, else--stoo't
thought, barn.

17u bings"
as a position. proprosion, it as any developin
epoch 50
Train on 200278 samples
--- Generating with seed: " view. for there are at least two (perhaps many
more) elemen"
------ temperature: 0.

manwity of the world from its long hard and will even than the savage of the same morality of such an acception and the morality," and in the human opinion of the fact that is to be all the
more probable of the most depth of religious and deceived--so was alone which ca
------ temperature: 1.0
 most depth of religious and deceived--so was alone which called by one who he long possible spirit,
"pessids of early. one of the cirtred by an echowative. spead spirits, "necessary habits in result
the
pritation to wait
double however, have bevengates!

2 so is aljost species to
music. no, an except one man could that proves an by the most sympathy, has
prompon for here aloft--wirly, landred what he let noruing religious reading of namly of minds
from ma
------ temperature: 1.2
t he let noruing religious reading of namly of minds
from many sensing womans. if
keently
about that fursing dejude arhisuntences spirits hope,--"what
here-lexned of
his childash borx yonucienties--which for weeceen, or 

=misconception of the self-plentical sense of the seeming, the same to seem to the same to a desire of the self-conception of the standard of the spirit and stronger and the conception of the sense of the self-conception of the most perfected period of the self-concealed, and all the standard of the standard of the subject of the spirit and present the sense of the seeming of the senses of the sense of the sens
------ temperature: 0.5
 sense of the seeming of the senses of the sense of the sense of this paraded, and they are the most festale the condition of the reverses and standard of the belief in many and the subject and intellect that they are not now a places of any one
in all sense
with no metaphysical nature in a desires of the selection of the world here with standing. and it is one mande to be taken of the fact only sense of the well to his faithther power, and all the concept
------ temperature: 1.0
ense of the well to his faithther power, and all the conception ofnes? andlf

A low temperature results in extremely repetitive and predictable text, but where local structure is highly realistic: in particular, all words (a word being a local pattern of characters) are real English words. With higher temperatures, the generated text becomes more interesting, surprising, even creative; it may sometimes invent completely new words that sound somewhat plausible (such as "eterned" or "troveration"). With a high temperature, the local structure starts breaking down and most words look like semi-random strings of characters. Without a doubt, here 0.5 is the most interesting temperature for text generation in this specific setup. Always experiment with multiple sampling strategies! A clever balance between learned structure and randomness is what makes generation interesting.