# DEEP LEARNING: USING NLP TO CREATE NIETZSHE'S WRITING

Example script to generate text from Nietzsche's writings. At least 20 epochs are required before the generated text starts sounding coherent. It is recommended to run this script on GPU, as recurrent networks are quite computationally intensive. If you try this script on new data, make sure your corpus has at least ~100k characters. ~1M is better.

Source of this python notebook is referred to from [keras team github](https://github.com/keras-team/keras/blob/master/examples/lstm_text_generation.py). 

In [11]:
'''
#Example script to generate text from Nietzsche's writings.
At least 20 epochs are required before the generated text
starts sounding coherent.
It is recommended to run this script on GPU, as recurrent
networks are quite computationally intensive.
If you try this script on new data, make sure your corpus
has at least ~100k characters. ~1M is better.
'''

"\n#Example script to generate text from Nietzsche's writings.\nAt least 20 epochs are required before the generated text\nstarts sounding coherent.\nIt is recommended to run this script on GPU, as recurrent\nnetworks are quite computationally intensive.\nIf you try this script on new data, make sure your corpus\nhas at least ~100k characters. ~1M is better.\n"

In [12]:
from __future__ import print_function
from keras.callbacks import LambdaCallback
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from keras.optimizers import RMSprop
from keras.utils.data_utils import get_file
import numpy as np
import random
import sys
import io

In [13]:
path = get_file(
    'nietzsche.txt',
    origin='https://s3.amazonaws.com/text-datasets/nietzsche.txt')
with io.open(path, encoding='utf-8') as f:
    text = f.read().lower()
print('corpus length:', len(text))

corpus length: 600893


In [14]:
chars = sorted(list(set(text)))
print('total chars:', len(chars))
char_indices = dict((c, i) for i, c in enumerate(chars))
indices_char = dict((i, c) for i, c in enumerate(chars))

total chars: 57


In [15]:
# cut the text in semi-redundant sequences of maxlen characters
maxlen = 40
step = 3
sentences = []
next_chars = []
for i in range(0, len(text) - maxlen, step):
    sentences.append(text[i: i + maxlen])
    next_chars.append(text[i + maxlen])
print('nb sequences:', len(sentences))

nb sequences: 200285


In [16]:
print('Vectorization...')
x = np.zeros((len(sentences), maxlen, len(chars)), dtype=np.bool)
y = np.zeros((len(sentences), len(chars)), dtype=np.bool)
for i, sentence in enumerate(sentences):
    for t, char in enumerate(sentence):
        x[i, t, char_indices[char]] = 1
    y[i, char_indices[next_chars[i]]] = 1

Vectorization...


In [27]:
# build the model: a single LSTM
print('Build model...')
model = Sequential()
model.add(LSTM(128, input_shape=(maxlen, len(chars))))
model.add(Dense(len(chars), activation='sigmoid'))

optimizer = RMSprop()
model.compile(loss='categorical_crossentropy', optimizer=optimizer)

Build model...


In [19]:
# helper function to sample an index from a probability array
def sample(preds, temperature=1.0):
    # helper function to sample an index from a probability array
    preds = np.asarray(preds).astype('float64')
    preds = np.log(preds) / temperature
    exp_preds = np.exp(preds)
    preds = exp_preds / np.sum(exp_preds)
    probas = np.random.multinomial(1, preds, 1)
    return np.argmax(probas)

In [20]:
# function invoked at end of each epoch. Prints generated text.
def on_epoch_end(epoch, _):
    # Function invoked at end of each epoch. Prints generated text.
    print()
    print('----- Generating text after Epoch: %d' % epoch)

    start_index = random.randint(0, len(text) - maxlen - 1)
    for diversity in [0.2, 0.5, 1.0, 1.2]:
        print('----- diversity:', diversity)

        generated = ''
        sentence = text[start_index: start_index + maxlen]
        generated += sentence
        print('----- Generating with seed: "' + sentence + '"')
        sys.stdout.write(generated)

        for i in range(400):
            x_pred = np.zeros((1, maxlen, len(chars)))
            for t, char in enumerate(sentence):
                x_pred[0, t, char_indices[char]] = 1.

            preds = model.predict(x_pred, verbose=0)[0]
            next_index = sample(preds, diversity)
            next_char = indices_char[next_index]

            sentence = sentence[1:] + next_char

            sys.stdout.write(next_char)
            sys.stdout.flush()
        print()

print_callback = LambdaCallback(on_epoch_end=on_epoch_end)

In [21]:
model.fit(x, y,
          batch_size=128,
          epochs=20,
          callbacks=[print_callback])

Epoch 1/20

----- Generating text after Epoch: 0
----- diversity: 0.2
----- Generating with seed: "ently celebrated a veritable orgy of bad"
ently celebrated a veritable orgy of bad an the the the the the the the the the the an the the the the the the the the the the the the the the the the the the the the the the the the the the sere the the the the the the the sere the the the the the the the the the the the and and in the an the the the the the the the the the the the the the the the the the the the the the the the the the an the core the the the the the the the the the t
----- diversity: 0.5
----- Generating with seed: "ently celebrated a veritable orgy of bad"
ently celebrated a veritable orgy of bad and the ang be fore in wo an ther sthit and an the the snell of an the the the the an the the sace thint whan s and ing the the the the mund and ang ange tho the mes and the gart in the mins the this in  he mere thit the the sot an th of ar the the carelitreng an the sol to thit an th

  """


ethive wittour notial ond in a compleasive and
kind in their one memely. wher individout that he does adaysposioning welc quilice
merely 
----- diversity: 1.2
----- Generating with seed: "hilosopher on the subject of man is, in "
hilosopher on the subject of man is, in their mose arain: slewal
that i natured, it what as p hones, an by gader. an everything the oberry
man atespluy ; lot womenks, to stuisct, deratenge); hishreciange what a
heritahtatily, yoddy todry as rebigiows withing varites speriet would, slumbst. with, an ordering saibtlur, a
peralenation. well-kels if silounused sureed as
undarnow as in secron" bestores, to, the revilet--and
thit can feed the
Epoch 18/20

----- Generating text after Epoch: 17
----- diversity: 0.2
----- Generating with seed: "f savagery or by a
terrorizing process i"
f savagery or by a
terrorizing process in the sense of the stronger and the soul for the problem of the bad to be the self-man in the morality of the soul for instinction of the fact tha

<keras.callbacks.History at 0x7f713f6aef60>

Investigation ends here.