## Preparing Data

In [7]:
import keras
import numpy as np

# path = keras.utils.get_file(
#    'nietzsche.txt',
#    origin='https://s3.amazonaws.com/text-datasets/nietzsche.txt')
# text = open(path).read().lower()
# text = open("/home/trp22/Star_Wars_A_New_Hope.pdf").read().lower()
text = open("/Users/tylerpoel/SW_ANewHope.txt").read().lower()
print('Corpus length:', len(text))

Corpus length: 224485


In [8]:
print(text)

star wars: a new hope 1. 
a long time ago, in a galaxy far, far, away... 
a vast sea of stars serves as the backdrop for the main title. war drums echo through the heavens as a rollup slowly crawls into infinity. 
it is a period of civil war. rebel spaceships, striking from a hidden base, have won their first victory against the evil galactic empire. 
during the battle, rebel spies managed to steal secret plans to the empire's ultimate weapon, the death star, an armored space station with enough power to destroy an entire planet. 
pursued by the empire's sinister agents, princess leia races home aboard her starship, custodian of the stolen plans that can save her people and restore freedom to the galaxy... 
the awesome yellow planet of tatooine emerges from a total eclipse, her two moons glowing against the darkness. a tiny silver spacecraft, a rebel blockade runner firing lasers from the back of the ship, races through space. it is pursed by a giant imperial star destroyer. hundreds o

In [4]:

# Length of extracted character sequences
maxlen = 60

# We sample a new sequence every `step` characters
step = 3

# This holds our extracted sequences
sentences = []

# This holds the targets (the follow-up characters)
next_chars = []

for i in range(0, len(text) - maxlen, step):
    sentences.append(text[i: i + maxlen])
    next_chars.append(text[i + maxlen])
print('Number of sequences:', len(sentences))

# List of unique characters in the corpus
chars = sorted(list(set(text)))
print('Unique characters:', len(chars))
# Dictionary mapping unique characters to their index in `chars`
char_indices = dict((char, chars.index(char)) for char in chars)

# Next, one-hot encode the characters into binary arrays.
print('Vectorization...')
x = np.zeros((len(sentences), maxlen, len(chars)), dtype=np.bool)
y = np.zeros((len(sentences), len(chars)), dtype=np.bool)
for i, sentence in enumerate(sentences):
    for t, char in enumerate(sentence):
        x[i, t, char_indices[char]] = 1
    y[i, char_indices[next_chars[i]]] = 1

Number of sequences: 66370
Unique characters: 55
Vectorization...


## Building the network

In [5]:
from keras import layers

model = keras.models.Sequential()
model.add(layers.LSTM(128, input_shape=(maxlen, len(chars))))
model.add(layers.Dense(len(chars), activation='softmax'))

In [6]:
optimizer = keras.optimizers.RMSprop(lr=0.01)
model.compile(loss='categorical_crossentropy', optimizer=optimizer)

## Training the model

In [7]:
def sample(preds, temperature=1.0):
    preds = np.asarray(preds).astype('float64')
    preds = np.log(preds) / temperature
    exp_preds = np.exp(preds)
    preds = exp_preds / np.sum(exp_preds)
    probas = np.random.multinomial(1, preds, 1)
    return np.argmax(probas)

In [9]:
import random
import sys

for epoch in range(1, 60):
    print('epoch', epoch)
    # Fit the model for 1 epoch on the available training data
    model.fit(x, y,
              batch_size=128,
              epochs=1)

    # Select a text seed at random
    start_index = random.randint(0, len(text) - maxlen - 1)
    generated_text = text[start_index: start_index + maxlen]
    print('--- Generating with seed: "' + generated_text + '"')

    for temperature in [0.2, 0.5, 1.0, 1.2]:
        print('------ temperature:', temperature)
        sys.stdout.write(generated_text)

        # We generate 400 characters
        for i in range(400):
            sampled = np.zeros((1, maxlen, len(chars)))
            for t, char in enumerate(generated_text):
                sampled[0, t, char_indices[char]] = 1.

            preds = model.predict(sampled, verbose=0)[0]
            next_index = sample(preds, temperature)
            next_char = chars[next_index]

            generated_text += next_char
            generated_text = generated_text[1:]

            sys.stdout.write(next_char)
            sys.stdout.flush()
        print()


epoch 1
Epoch 1/1
--- Generating with seed: "nd
nonsense.
ben
i suggest you try it again, luke.
ben place"
------ temperature: 0.2
nd
nonsense.
ben
i suggest you try it again, luke.
ben places the empiround the pirateship is a blowd and whil as
there'r startion the surface and a huge way from the pirateship.
luke
she's the starts the death star surface. to the street.
you wail as they dida's targeting a hull
looking been to the ship.
luke
i can't do you know what is a bound the
empiroas is a fine as
anct's a blown on the death star surface.
tarking the pirateship is a blown on the sur
------ temperature: 0.5
h star surface.
tarking the pirateship is a blown on the surface the empirth and
a ma yout there, twous the remaining.
201

int. death star - main forward bay

338

the tie ship ships for something in the empire?
one with his and they who wings a fighters.
luke
she's the have to which she sens he towking the greatit of the
main for to going to him.
han
what you to have you mas pert

  This is separate from the ipykernel package so we can avoid doing imports until
