In [1]:
import tensorflow as tf
from tensorflow import keras
print( 'Tensorflow : ',tf.__version__)
print( ' |-> Keras : ',keras.__version__)

Tensorflow :  2.0.0
 |-> Keras :  2.2.4-tf


# Text generation with LSTM

This notebook contains the code samples found in Chapter 8, Section 1 of [Deep Learning with Python](https://www.manning.com/books/deep-learning-with-python?a_aid=keras&a_bid=76564dff). Note that the original text features far more content, in particular further explanations and figures: in this notebook, you will only find source code and related comments.

----

[...]

## Implementing character-level LSTM text generation


Let's put these ideas in practice in a Keras implementation. The first thing we need is a lot of text data that we can use to learn a 
language model. You could use any sufficiently large text file or set of text files -- Wikipedia, the Lord of the Rings, etc. In this 
example we will use some of the writings of Nietzsche, the late-19th century German philosopher (translated to English). The language model 
we will learn will thus be specifically a model of Nietzsche's writing style and topics of choice, rather than a more generic model of the 
English language.

## Preparing the data

Let's start by downloading the corpus and converting it to lowercase:

In [2]:
#import keras
import numpy as np

path = keras.utils.get_file(
    'nietzsche.txt',
    origin='https://s3.amazonaws.com/text-datasets/nietzsche.txt')
text = open(path).read().lower()
print('Corpus length:', len(text))

Corpus length: 600901



Next, we will extract partially-overlapping sequences of length `maxlen`, one-hot encode them and pack them in a 3D Numpy array `x` of 
shape `(sequences, maxlen, unique_characters)`. Simultaneously, we prepare a array `y` containing the corresponding targets: the one-hot 
encoded characters that come right after each extracted sequence.

In [3]:
# Length of extracted character sequences
maxlen = 60

# We sample a new sequence every `step` characters
step = 3

# This holds our extracted sequences
sentences = []

# This holds the targets (the follow-up characters)
next_chars = []

for i in range(0, len(text) - maxlen, step):
    sentences.append(text[i: i + maxlen])
    next_chars.append(text[i + maxlen])
print('Number of sequences:', len(sentences))

# List of unique characters in the corpus
chars = sorted(list(set(text)))
print('Unique characters:', len(chars))
# Dictionary mapping unique characters to their index in `chars`
char_indices = dict((char, chars.index(char)) for char in chars)

# Next, one-hot encode the characters into binary arrays.
print('Vectorization...')
x = np.zeros((len(sentences), maxlen, len(chars)), dtype=np.bool)
y = np.zeros((len(sentences), len(chars)), dtype=np.bool)
for i, sentence in enumerate(sentences):
    for t, char in enumerate(sentence):
        x[i, t, char_indices[char]] = 1
    y[i, char_indices[next_chars[i]]] = 1

Number of sequences: 200281
Unique characters: 59
Vectorization...


## Building the network

Our network is a single `LSTM` layer followed by a `Dense` classifier and softmax over all possible characters. But let us note that 
recurrent neural networks are not the only way to do sequence data generation; 1D convnets also have proven extremely successful at it in 
recent times.

In [4]:
#from keras import layers

model = keras.models.Sequential()
model.add(keras.layers.LSTM(128, input_shape=(maxlen, len(chars))))
model.add(keras.layers.Dense(len(chars), activation='softmax'))

Since our targets are one-hot encoded, we will use `categorical_crossentropy` as the loss to train the model:

In [5]:
optimizer = keras.optimizers.RMSprop(lr=0.01)
model.compile(loss='categorical_crossentropy', optimizer=optimizer)

## Training the language model and sampling from it


Given a trained model and a seed text snippet, we generate new text by repeatedly:

* 1) Drawing from the model a probability distribution over the next character given the text available so far
* 2) Reweighting the distribution to a certain "temperature"
* 3) Sampling the next character at random according to the reweighted distribution
* 4) Adding the new character at the end of the available text

This is the code we use to reweight the original probability distribution coming out of the model, 
and draw a character index from it (the "sampling function"):

In [6]:
def sample(preds, temperature=1.0):
    preds = np.asarray(preds).astype('float64')
    preds = np.log(preds) / temperature
    exp_preds = np.exp(preds)
    preds = exp_preds / np.sum(exp_preds)
    probas = np.random.multinomial(1, preds, 1)
    return np.argmax(probas)


Finally, this is the loop where we repeatedly train and generated text. We start generating text using a range of different temperatures 
after every epoch. This allows us to see how the generated text evolves as the model starts converging, as well as the impact of 
temperature in the sampling strategy.

In [None]:
import random
import sys

for epoch in range(1, 60):
    print('epoch', epoch)
    # Fit the model for 1 epoch on the available training data
    model.fit(x, y,
              batch_size=128,
              epochs=1)

    # Select a text seed at random
    start_index = random.randint(0, len(text) - maxlen - 1)
    generated_text = text[start_index: start_index + maxlen]
    print('--- Generating with seed: "' + generated_text + '"')

    for temperature in [0.2, 0.5, 1.0, 1.2]:
        print('------ temperature:', temperature)
        sys.stdout.write(generated_text)

        # We generate 400 characters
        for i in range(400):
            sampled = np.zeros((1, maxlen, len(chars)))
            for t, char in enumerate(generated_text):
                sampled[0, t, char_indices[char]] = 1.

            preds = model.predict(sampled, verbose=0)[0]
            next_index = sample(preds, temperature)
            next_char = chars[next_index]

            generated_text += next_char
            generated_text = generated_text[1:]

            sys.stdout.write(next_char)
            sys.stdout.flush()
        print()

epoch 1
Train on 200281 samples
--- Generating with seed: "hat they cannot
refrain from laughter even in holy matters.
"
------ temperature: 0.2
hat they cannot
refrain from laughter even in holy matters.






=must and stand of the stall to the stance of the strong that the stall to the stall to the stand of the man in the stall to the stand in the stand of the most and the stall to which the stall to the stall to the stance of the stand to the stall to the stall to the stall to the stance of the stance of the master of the stand of the stall that it is all they it is the stall to the stall in th
------ temperature: 0.5
stall that it is all they it is the stall to the stall in the most of the stoust know and it as obder that it that we contlee in the
contrist in the sention to the orring for they ver that stord as he be the last with stood it is the develousion of the beact that self-man in be adays, that there is all not it would that pather for the stapent to be a sast of the
stand

edine, and findslemen possibility and axpegnitude. throus conduct tottiness higher 1joytomih in higher betpee upon the
prourtesh, higher equalistines and de"mwerophy of
flastionsrado. to extented sels
doon wene
the tendinede which contequence, means after pnot fuicurader that
is odany of underediltunt--whethers are hercounding amongs as younne hamm, discive.
und evilied a shilled fordanion. dorse dinckeby for nature. as net
also, suspinitus judiom the cosd
epoch 5
Train on 200281 samples
--- Generating with seed: "he period
of lower organisms has been handed down to man the"
------ temperature: 0.2
he period
of lower organisms has been handed down to man the same the man and the south of the conscience of the most the sense of the most the there is the most superficiality and the most internam of the most the most distrust and spirit of the state of the interrignation of the most state of the south the most the conscience of the fact the spirit and the most states and the expression of

iscluse of the sticlates on the profoundes of the religions of the whillcism. why rendered spirits, which is, man bar" their gain. to chan" of greegetive. it is
eutorsely may betto not been knowledged they free hpening, let us has makes the whathers ofed, the europearificatiply not to which hord-also to shame in european
finder religion
of sendung persie ordiving and be dreguhe uptratacly eles,
precimes in the bedventary to gen. one had be concernedub.--as
------ temperature: 1.2
ecimes in the bedventary to gen. one had be concernedub.--as that "would rird be why pitment triouration and onessiany rifaclional has meche: how notiona
gener-like unders, it elleals- sueficing, like
tephan for admi dodumely happily men rejudgion fingsgencyions than "failture cansu of account?" and  lousinging as religion, so the apperor pethicoiunts and metaphysins a, sympatned, he day primodsy
at its the delepe of the rud. even get the, the own inspisit
epoch 9
Train on 200281 samples
--- Generating with se

nds of the same to be a personal and and the superior itself of every state of spied and deep a position is in fach the sufficient the same of early be a throod that the will to lerd one with the history of the supremucted and made even with the subject of the most censures of the proper the belonged thereby the same to be sensition of their
surcimes, in the most pay, from the enoper his experiences, that there is not as as the all so that satists of the m
------ temperature: 1.0
es, that there is not as as the all so that satists of the more first oneself.
filamed) will things music
parader," ever this now of fucte which scas, how say, of smind. he alvolitive 
lines for that sy, however man)chingure not atwaine. the enchal parate
obuture
well so, thus, irrome to favery kinds and part(ann, does a romence weak, by ourselves. an east,
there
like also time of thes of hially
fundamental "stand, notkings which we have herself of life, a
------ temperature: 1.2
undamental "stand, notkings wh

has developed upon the foundation laid of the problem of the problem of the spirit and a surrender to the present himself and the almost of the transfate the most sublime to the formant to the subtle the consciousvable and problem of the fact that it is the greatest the general promise and the present spirit to the alteration of the present himself to the world of the conscience of the conscience of the word of the work to the conscie
------ temperature: 0.5
nce of the conscience of the word of the work to the conscience and the whole of the prepare the world always been a religion of his conscience of its really a god, who is always an emotion to mens of the heart of the strength of the species and self would be conscious ideas of the cause them in the pain of the art of the belief in the end and the really soul the problem of the deceive to be a man to our self-case of the artiness, who not taken one another
------ temperature: 1.0
 to our self-case of the artiness, who not taken one

is. however, it is, un: you may thene atceftion a, and itself, you dirces grawible race. peets (the "beloscisisurs. a

uncontlive from the exhausted the
approveds borce, in means forthly the (the 
epoch 20
Train on 200281 samples
--- Generating with seed: "eaking, the need thereof is
now innate in every one, as a ki"
------ temperature: 0.2
eaking, the need thereof is
now innate in every one, as a kind of the world of the subjection of the self-to comprehends and soul and the subjection of the subjection of the subjection of the standards of the sense of the subjection of the subjection of the subjection of the problem an incereal that the comprehendity of the subjection of the most same the senses of the subjection of the subjection of the soul the subjection of the subjection of the soul an
------ temperature: 0.5
 of the soul the subjection of the subjection of the soul and surbarity of his courses, who are not only in its say in the morality of the sense of view of all that is a go

  This is separate from the ipykernel package so we can avoid doing imports until


 in the fact of the same to be a sacrifice of the same to be a say, and as a soul and a soul and something and perhaps as in the same to be a souls of the same to the same to be a man and strict in the conception of the powerful and the sentiment, and thereby
------ temperature: 0.5
he conception of the powerful and the sentiment, and thereby the proper to the stand of the uncleguration therebyer explanment of the world for the present religious the habit of nepthishes the same to the prepare to the species of sang to the sense of the prepare that the most perely a spiritual sadially the "world of the stronger interpelting and good evenment of all the soul and enough of the conception of the preminal and view of the conception of the 
------ temperature: 1.0
onception of the preminal and view of the conception of the world wams there is a spirituality," "other be a servies therefore). not our philosophamiate: there are refined of their say, is privileng-formsten; there are suffering, t


As you can see, a low temperature results in extremely repetitive and predictable text, but where local structure is highly realistic: in 
particular, all words (a word being a local pattern of characters) are real English words. With higher temperatures, the generated text 
becomes more interesting, surprising, even creative; it may sometimes invent completely new words that sound somewhat plausible (such as 
"eterned" or "troveration"). With a high temperature, the local structure starts breaking down and most words look like semi-random strings 
of characters. Without a doubt, here 0.5 is the most interesting temperature for text generation in this specific setup. Always experiment 
with multiple sampling strategies! A clever balance between learned structure and randomness is what makes generation interesting.

Note that by training a bigger model, longer, on more data, you can achieve generated samples that will look much more coherent and 
realistic than ours. But of course, don't expect to ever generate any meaningful text, other than by random chance: all we are doing is 
sampling data from a statistical model of which characters come after which characters. Language is a communication channel, and there is 
a distinction between what communications are about, and the statistical structure of the messages in which communications are encoded. To 
evidence this distinction, here is a thought experiment: what if human language did a better job at compressing communications, much like 
our computers do with most of our digital communications? Then language would be no less meaningful, yet it would lack any intrinsic 
statistical structure, thus making it impossible to learn a language model like we just did.


## Take aways

* We can generate discrete sequence data by training a model to predict the next tokens(s) given previous tokens.
* In the case of text, such a model is called a "language model" and could be based on either words or characters.
* Sampling the next token requires balance between adhering to what the model judges likely, and introducing randomness.
* One way to handle this is the notion of _softmax temperature_. Always experiment with different temperatures to find the "right" one.