# Star Wars Generated Text

This file builds a LSTM model that is fed the scripts from the
Original Star Wars Trilogy, and then generates text based off
of what it was fed.

This notebook uses code from Chollet, found at:
https://github.com/fchollet/deep-learning-with-python-notebooks/blob/master/8.1-text-generation-with-lstm.ipynb

## Preparing The Data

In [2]:
import keras
import numpy as np

text = open("/home/trp22/CS/344/cs344/Project/OriginalTrilogy_script.txt").read().lower()
print('Corpus length:', len(text))

Using TensorFlow backend.


Corpus length: 494395


In [3]:
# Length of extracted character sequences
maxlen = 60

# We sample a new sequence every `step` characters
step = 3

# This holds our extracted sequences
sentences = []

# This holds the targets (the follow-up characters)
next_chars = []

for i in range(0, len(text) - maxlen, step):
    sentences.append(text[i: i + maxlen])
    next_chars.append(text[i + maxlen])
print('Number of sequences:', len(sentences))

# List of unique characters in the corpus
chars = sorted(list(set(text)))
print('Unique characters:', len(chars))
# Dictionary mapping unique characters to their index in `chars`
char_indices = dict((char, chars.index(char)) for char in chars)

# Next, one-hot encode the characters into binary arrays.
print('Vectorization...')
x = np.zeros((len(sentences), maxlen, len(chars)), dtype=np.bool)
y = np.zeros((len(sentences), len(chars)), dtype=np.bool)
for i, sentence in enumerate(sentences):
    for t, char in enumerate(sentence):
        x[i, t, char_indices[char]] = 1
    y[i, char_indices[next_chars[i]]] = 1

Number of sequences: 164779
Unique characters: 60
Vectorization...


After playing around with many different configurations, a maxlen
of 60, and a step size of 3 worked the best for this text.

## Building the network

In [4]:
from keras import layers

model = keras.models.Sequential()
model.add(layers.LSTM(128, input_shape=(maxlen, len(chars))))
model.add(layers.Dense(len(chars), activation='softmax'))

In [5]:
optimizer = keras.optimizers.RMSprop(lr=0.01)
model.compile(loss='categorical_crossentropy', optimizer=optimizer)

## Training the model

In [6]:
'''
sample() recieves a probability distribution from the model,
reweights it according to a given temperature,
and returns the index value of the next character.
'''
def sample(preds, temperature=1.0):
    preds = np.asarray(preds).astype('float64')
    preds = np.log(preds) / temperature
    exp_preds = np.exp(preds)
    preds = exp_preds / np.sum(exp_preds)
    probas = np.random.multinomial(1, preds, 1)
    return np.argmax(probas)

In [None]:
import random
import sys

for epoch in range(1, 61):
    print('epoch', epoch)
    # Fit the model for 1 epoch on the available training data
    model.fit(x, y,
              batch_size=128,
              epochs=1)

    # text is generated only every 5 epochs.
    if epoch % 5 == 0:
        # Select a text seed at random
        start_index = random.randint(0, len(text) - maxlen - 1)
        generated_text = text[start_index: start_index + maxlen]
        print('--- Generating with seed: "' + generated_text + '"')

        for temperature in [0.2, 0.5]:
            print('------ temperature:', temperature)
            sys.stdout.write(generated_text)

            # We generate 600 characters
            for i in range(600):
                sampled = np.zeros((1, maxlen, len(chars)))
                for t, char in enumerate(generated_text):
                    sampled[0, t, char_indices[char]] = 1.

                preds = model.predict(sampled, verbose=0)[0]
                next_index = sample(preds, temperature)
                next_char = chars[next_index]

                generated_text += next_char
                generated_text = generated_text[1:]

                sys.stdout.write(next_char)
                sys.stdout.flush()
            print()


epoch 1
Epoch 1/1
epoch 2
Epoch 1/1
epoch 3
Epoch 1/1
epoch 4
Epoch 1/1
epoch 5
Epoch 1/1
--- Generating with seed: "rom 
and takes a bite.

luke
put that down.  hey!  that's my"
------ temperature: 0.2
rom 
and takes a bite.

luke
put that down.  hey!  that's my face the ship.

luke
there is ships as the with his hand.

luke
you and the ship and and the ship.

luke
the enter and chewie, and the two room is control the artoo.

leia
the room.

luke
i'm not got a small and the screen and the complet of the strang. the strang and the strang. he stands toward the with the falcon and the screen to his hand and the ship.

luke
the control panel, the strang.

luke
the computer the strang. the ship.

leia
you're complet be a star destroyer complet of the 
process and chewbacca and luke and looks toward the ship.

luke
there is the catyon and the the artoo and
------ temperature: 0.5
rd the ship.

luke
there is the catyon and the the artoo and the rebel bead starts moves and restures the screen

  This is separate from the ipykernel package so we can avoid doing imports until
