# (Biased) Quote Generation via RNN
##### NOTE: PSAW allows only 180 queries/min.

While many quote generators already exist in the wild, I thought it would be interesting to grab a "biased" set of quotes that represents what the Reddit community at large generally prefers when it comes to selecting quotes. There is no requirement to post in the /r/quotes subreddit, but observing the output might give some insight into what exactly makes a quote "memorable" or "significant" to a typical reddit user.

In [1]:
!pip install --user praw
!pip install --user psaw

Collecting praw
  Downloading https://files.pythonhosted.org/packages/87/95/1abd708fce27ba87ca9f17c6e68fe9d287666ad067432a7cd54d46424276/praw-6.3.1-py2.py3-none-any.whl (126kB)
Collecting update-checker>=0.16 (from praw)
  Using cached https://files.pythonhosted.org/packages/17/c9/ab11855af164d03be0ff4fddd4c46a5bd44799a9ecc1770e01a669c21168/update_checker-0.16-py2.py3-none-any.whl
Collecting prawcore<2.0,>=1.0.1 (from praw)
  Using cached https://files.pythonhosted.org/packages/76/b5/ce6282dea45cba6f08a30e25d18e0f3d33277e2c9fcbda75644b8dc0089b/prawcore-1.0.1-py2.py3-none-any.whl
Collecting websocket-client>=0.54.0 (from praw)
  Using cached https://files.pythonhosted.org/packages/29/19/44753eab1fdb50770ac69605527e8859468f3c0fd7dc5a76dd9c4dbd7906/websocket_client-0.56.0-py2.py3-none-any.whl
Installing collected packages: update-checker, prawcore, websocket-client, praw
Successfully installed praw-6.3.1 prawcore-1.0.1 update-checker-0.16 websocket-client-0.56.0
Collecting psaw
  Using ca

## Brief Analysis of (Possible) Dataset

In [2]:
import praw
from psaw import PushshiftAPI

# r = praw.Reddit()
api = PushshiftAPI()

In [3]:
import datetime as dt
import time
import re

# check date diff
start_epoch = int(dt.datetime(2019,4,8).timestamp())
end_epoch = int(dt.datetime(2019,4,15).timestamp())

# weekly diff
epoch_week_time = end_epoch-start_epoch
print(end_epoch-start_epoch)

604800


In [4]:
# test reddit query of quotes within week of today
res = list(api.search_submissions(after=start_epoch,
                                  before=end_epoch,
                                  subreddit='quotes',
                                  filter=['title'],
                                  limit=1000))

In [5]:
# check for a reasonable length (i.e. < 1000 as specied by bot)
len(res)

415

In [6]:
# Check parsing mechanic (only need quotes from data)
res_parsed = list(map(lambda x:x.title, res))

In [7]:
# Note that text is properly formatted, but quotations vary wildly!
# For now, these quotation marks will be stripped from the text
# Text format most often : ("/'/“)Quote("/'/“) - Person
# Most rudimentary check would be to enforce that first char is not a letter
for quote in res_parsed:
    if not quote[0].isalpha():
        print(quote.strip())

"Take her to the moon for me." - Bing Bong (Inside Out)
["The Devil in the Dark"] impressed me because it presented the idea, unusual in science fiction then and now, that something weird, and even dangerous, need not be malevolent. That is a lesson that many of today's politicians have yet to learn.― Arthur C. Clarke
"We have a system that increasingly taxes work and subsidizes nonwork." — Milton Friedman
‘Men fight for liberty and win it with hard knocks; their children brought up easy, let it slip away again, poor fools. And their grandchildren are once again slaves’ - D. H. Lawrence
"Never drink and park. Accidents cause people."
“Is the poop deck really what I think it is?” -Homer J Simpson
“Where ever you go, there you are”
"You must remember to love people and use things, rather than to love things and use people." ~Venerable Fulton J. Sheen
"The man who moves a mountain begins by carrying away small stones." - Confucius
“Everything I have undertaken, everything I have expatiate

The above seems decent enough. Hopefully, the presence of many more correctly formed quotes will convince the network to make true quotes more often than not.

Note that the different characters used between the quotes and the person the quote is attributed to might cause issues. This is easily adjusted before taking the character into the dataset. There are a couple other things that can be noticed but we ignore this for now as they can be properly ignored using Python.

## Scraping

In [8]:
# Begin to scrape quotes given the above restriction
MAX_QUOTES = 30000
MAX_QPM = 150  # max queries per min
WAIT_TIME = 61  # wait 61 seconds per MAX_QPM
DATASET_NAME = 'currQuotesDataset'
datasetSize = 0
queryCount = 0

# This changes by weeks. Time reduced by epoch_week_time
startEpoch = int(dt.datetime(2019,4,8).timestamp())
endEpoch = int(dt.datetime(2019,4,15).timestamp())
WEEK_DURATION = end_epoch-start_epoch

with open(DATASET_NAME, 'w') as quoteCompiler:
    while(datasetSize < MAX_QUOTES):
        # record start time
        startTimestamp = time.time()
        
        # make requested amount of queries
        for queryAttempt in range(MAX_QPM):
            if(datasetSize < MAX_QUOTES):
                # query counter (mostly for debugging)
                queryCount += 1
                print('Evaluating query #{}. '.format(queryCount), end='')

                # query reddit for given time
                res = list(api.search_submissions(after=startEpoch,
                                      before=endEpoch,
                                      subreddit='quotes',
                                      filter=['title'],
                                      limit=1000))

                # update time for next values
                startEpoch -= WEEK_DURATION
                endEpoch -= WEEK_DURATION

                # process current retrived values in res
                res_parsed = list(map(lambda x:x.title, res))
                for quote in res_parsed:
                    if not quote[0].isalpha(): # if first char not alpha, likely actual quote
                        replaced = re.sub(r"[~―]",r'-',re.sub(r"[‘“’”]",r'"', quote))  # change some formatting
                        asciiQuote = replaced.encode('ascii', 'ignore').decode("utf-8")   # remove ill-formatting
                        quoteCompiler.write(asciiQuote+'\n')    # write quote to dataset
                        datasetSize += 1
                print('Dataset size: {}'.format(datasetSize))
            else:
                break
        
        # wait for end time to reach at least 61 seconds to make another set of requests
        # unlikely to be reached since this is single-threaded
        while(time.time()-startTimestamp < WAIT_TIME):
            time.sleep(1) # waste a second (here to waste time)

Evaluating query #1. Dataset size: 282
Evaluating query #2. Dataset size: 573
Evaluating query #3. Dataset size: 878
Evaluating query #4. Dataset size: 1192
Evaluating query #5. Dataset size: 1541
Evaluating query #6. Dataset size: 1869
Evaluating query #7. Dataset size: 2180
Evaluating query #8. Dataset size: 2490
Evaluating query #9. Dataset size: 2798
Evaluating query #10. Dataset size: 3079
Evaluating query #11. Dataset size: 3364
Evaluating query #12. Dataset size: 3670
Evaluating query #13. Dataset size: 3951
Evaluating query #14. Dataset size: 4273
Evaluating query #15. Dataset size: 4552
Evaluating query #16. Dataset size: 4832
Evaluating query #17. Dataset size: 5138
Evaluating query #18. Dataset size: 5426
Evaluating query #19. Dataset size: 5703
Evaluating query #20. Dataset size: 5980
Evaluating query #21. Dataset size: 6262
Evaluating query #22. Dataset size: 6555
Evaluating query #23. Dataset size: 6865
Evaluating query #24. Dataset size: 7123
Evaluating query #25. Datase



Dataset size: 7867
Evaluating query #28. Dataset size: 8106
Evaluating query #29. Dataset size: 8339
Evaluating query #30. Dataset size: 8629
Evaluating query #31. Dataset size: 8862
Evaluating query #32. Dataset size: 9107
Evaluating query #33. Dataset size: 9347
Evaluating query #34. Dataset size: 9586
Evaluating query #35. Dataset size: 9789
Evaluating query #36. Dataset size: 10017
Evaluating query #37. Dataset size: 10252
Evaluating query #38. Dataset size: 10483
Evaluating query #39. Dataset size: 10726
Evaluating query #40. Dataset size: 10932
Evaluating query #41. Dataset size: 11123
Evaluating query #42. Dataset size: 11324
Evaluating query #43. Dataset size: 11486
Evaluating query #44. Dataset size: 11686
Evaluating query #45. Dataset size: 11910
Evaluating query #46. Dataset size: 12136
Evaluating query #47. Dataset size: 12383
Evaluating query #48. Dataset size: 12595
Evaluating query #49. Dataset size: 12773
Evaluating query #50. Dataset size: 12992
Evaluating query #51. D

## Analysis
Take a simple look at what the dataset consists of...

In [9]:
DATASET_NAME = 'currQuotesDataset'

with open(DATASET_NAME, 'rb') as inFile:
    dataQuotes = inFile.read().decode(encoding='utf-8')
    
print("Length of dataset: {} characters".format(len(dataQuotes)))

# unique characters
vocab = sorted(set(dataQuotes))
print ('{} unique characters'.format(len(vocab)))
print(vocab)

Length of dataset: 3555109 characters
96 unique characters
['\n', '\r', ' ', '!', '"', '#', '$', '%', '&', "'", '(', ')', '*', '+', ',', '-', '.', '/', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', ':', ';', '<', '=', '>', '?', '@', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', '[', '\\', ']', '^', '_', '`', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', '{', '|', '}']


One thing to note is that there are actually quite a few symobls and such in the text. Training will be attempted with this in here, but will likely need to be removed if it impedes training. Hopefully the constrained format will somewhat alleviate that issue.

## RNN Setup & Training
For this particular setup, the char-RNN setup as used in the given material will be primarily used. As a possible further insight, the word-char-RNN might then be used in order to train the network (that would be a bit iffy though because of the placement of punctuation in this unpolished dataset).

In [10]:
import numpy as np
import tensorflow as tf
tf.enable_eager_execution()

import os
import time

In [11]:
# Creating a mapping from unique characters to indices
char2idx = {u:i for i, u in enumerate(vocab)}
idx2char = np.array(vocab)

text_as_int = np.array([char2idx[c] for c in dataQuotes])

# dictionary preview
print('{')
for char,_ in zip(char2idx, range(20)):
    print('  {:4s}: {:3d},'.format(repr(char), char2idx[char]))
print('  ...\n}')

{
  '\n':   0,
  '\r':   1,
  ' ' :   2,
  '!' :   3,
  '"' :   4,
  '#' :   5,
  '$' :   6,
  '%' :   7,
  '&' :   8,
  "'" :   9,
  '(' :  10,
  ')' :  11,
  '*' :  12,
  '+' :  13,
  ',' :  14,
  '-' :  15,
  '.' :  16,
  '/' :  17,
  '0' :  18,
  '1' :  19,
  ...
}


In [12]:
# Show how the first 13 characters from the text are mapped to integers
print ('{} ---- characters mapped to int ---- > {}'.format(repr(dataQuotes[:13]), text_as_int[:13]))

'"Take her to ' ---- characters mapped to int ---- > [ 4 54 67 77 71  2 74 71 84  2 86 81  2]


In [13]:
# The maximum length sentence we want for a single input in characters
seq_length = 100                                      # HYPERPARAMETER
examples_per_epoch = len(dataQuotes)//seq_length      # HYPERPARAMETER

# Create training examples / targets
char_dataset = tf.data.Dataset.from_tensor_slices(text_as_int)

for i in char_dataset.take(5):
  print(idx2char[i.numpy()])

Instructions for updating:
Colocations handled automatically by placer.
"
T
a
k
e


In [15]:
# sample of batching (which would be used in training)
sequences = char_dataset.batch(seq_length+1, drop_remainder=True)

for item in sequences.take(5):
  print(repr(''.join(idx2char[item.numpy()])))

'"Take her to the moon for me." - Bing Bong (Inside Out)\r\n["The Devil in the Dark"] impressed me becau'
'se it presented the idea, unusual in science fiction then and now, that something weird, and even dan'
"gerous, need not be malevolent. That is a lesson that many of today's politicians have yet to learn.-"
' Arthur C. Clarke\r\n"We have a system that increasingly taxes work and subsidizes nonwork."  Milton Fr'
'iedman\r\n"Men fight for liberty and win it with hard knocks; their children brought up easy, let it sl'


I think the fact that batching is done this way might exacerbate the ill-formatting issue, but I'll leave it as is for now. Word RNN's would be unable to recreate "names" or "sources" in the same way that a char-RNN would.

In [16]:
def split_input_target(chunk):
    input_text = chunk[:-1]
    target_text = chunk[1:]
    return input_text, target_text

dataset = sequences.map(split_input_target)

In [17]:
for input_example, target_example in  dataset.take(1):
    print ('Input data: ', repr(''.join(idx2char[input_example.numpy()])))
    print ('Target data:', repr(''.join(idx2char[target_example.numpy()])))

Input data:  '"Take her to the moon for me." - Bing Bong (Inside Out)\r\n["The Devil in the Dark"] impressed me beca'
Target data: 'Take her to the moon for me." - Bing Bong (Inside Out)\r\n["The Devil in the Dark"] impressed me becau'


In [18]:
for i, (input_idx, target_idx) in enumerate(zip(input_example[:5], target_example[:5])):
    print("Step {:4d}".format(i))
    print("  input: {} ({:s})".format(input_idx, repr(idx2char[input_idx])))
    print("  expected output: {} ({:s})".format(target_idx, repr(idx2char[target_idx])))

Step    0
  input: 4 ('"')
  expected output: 54 ('T')
Step    1
  input: 54 ('T')
  expected output: 67 ('a')
Step    2
  input: 67 ('a')
  expected output: 77 ('k')
Step    3
  input: 77 ('k')
  expected output: 71 ('e')
Step    4
  input: 71 ('e')
  expected output: 2 (' ')


In [19]:
# Batch size 
BATCH_SIZE = 64                                   # HYPERPARAMETER
steps_per_epoch = examples_per_epoch//BATCH_SIZE

# Buffer size to shuffle the dataset
# (TF data is designed to work with possibly infinite sequences, 
# so it doesn't attempt to shuffle the entire sequence in memory. Instead, 
# it maintains a buffer in which it shuffles elements).
BUFFER_SIZE = 10000

dataset = dataset.shuffle(BUFFER_SIZE).batch(BATCH_SIZE, drop_remainder=True)

dataset

<DatasetV1Adapter shapes: ((64, 100), (64, 100)), types: (tf.int32, tf.int32)>

### MODEL

In [20]:
# Length of the vocabulary in chars
vocab_size = len(vocab)

# The embedding dimension 
embedding_dim = 256

# Number of RNN units
rnn_units = 1024

# COMMENT OR UNCOMMENT THESE BASED ON WHICH NETWORK TO USE
# rnn = tf.keras.layers.CuDNNGRU
rnn = tf.keras.layers.CuDNNLSTM

# AND THESE GET CHOSEN ACCORDINGLY TO ABOVE
# checkpoint_dir = './training_checkpoints'
checkpoint_dir = './training_checkpoints_lstm'
    
def build_model(vocab_size, embedding_dim, rnn_units, batch_size):
    model = tf.keras.Sequential([tf.keras.layers.Embedding(vocab_size, embedding_dim, batch_input_shape=[batch_size, None]),
                                    rnn(rnn_units, return_sequences=True, 
                                        recurrent_initializer='glorot_uniform', stateful=True),
                                    tf.keras.layers.Dense(vocab_size)
                                  ])
    return model

In [21]:
model = build_model(
  vocab_size = len(vocab), 
  embedding_dim=embedding_dim, 
  rnn_units=rnn_units, 
  batch_size=BATCH_SIZE)

In [22]:
for input_example_batch, target_example_batch in dataset.take(1): 
  example_batch_predictions = model(input_example_batch)
  print(example_batch_predictions.shape, "# (batch_size, sequence_length, vocab_size)")

(64, 100, 96) # (batch_size, sequence_length, vocab_size)


In [23]:
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding (Embedding)        (64, None, 256)           24576     
_________________________________________________________________
cu_dnnlstm (CuDNNLSTM)       (64, None, 1024)          5251072   
_________________________________________________________________
dense (Dense)                (64, None, 96)            98400     
Total params: 5,374,048
Trainable params: 5,374,048
Non-trainable params: 0
_________________________________________________________________


In [24]:
def loss(labels, logits):
    return tf.keras.losses.sparse_categorical_crossentropy(labels, logits, from_logits=True)

example_batch_loss  = loss(target_example_batch, example_batch_predictions)
print("Prediction shape: ", example_batch_predictions.shape, " # (batch_size, sequence_length, vocab_size)") 
print("scalar_loss:      ", example_batch_loss.numpy().mean())

Prediction shape:  (64, 100, 96)  # (batch_size, sequence_length, vocab_size)
scalar_loss:       4.5639896


In [25]:
model.compile(
    optimizer = tf.train.AdamOptimizer(),
    loss = loss)

# Name of the checkpoint files
checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt_{epoch}")

checkpoint_callback=tf.keras.callbacks.ModelCheckpoint(
    filepath=checkpoint_prefix,
    save_weights_only=True)

In [26]:
EPOCHS=100
history = model.fit(dataset.repeat(), epochs=EPOCHS, steps_per_epoch=steps_per_epoch, callbacks=[checkpoint_callback])

Epoch 1/100
Instructions for updating:
Use tf.train.CheckpointManager to manage checkpoints rather than manually editing the Checkpoint proto.
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 6

### Text Generation

As an advisor, in order to run this section alone, rerun all sections from [Analysis] and onward ignoring the line of code that actually executes the training (AKA the line right above from this one). Text generation should vary depending on the version of the network actually being used (but the GRU one provided the most "deep" output).

In [27]:
# LSTM
tf.train.latest_checkpoint(checkpoint_dir)

'./training_checkpoints_lstm\\ckpt_100'

In [28]:
model = build_model(vocab_size, embedding_dim, rnn_units, batch_size=1)

model.load_weights(tf.train.latest_checkpoint(checkpoint_dir))

model.build(tf.TensorShape([1, None]))

In [29]:
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_1 (Embedding)      (1, None, 256)            24576     
_________________________________________________________________
cu_dnnlstm_1 (CuDNNLSTM)     (1, None, 1024)           5251072   
_________________________________________________________________
dense_1 (Dense)              (1, None, 96)             98400     
Total params: 5,374,048
Trainable params: 5,374,048
Non-trainable params: 0
_________________________________________________________________


In [53]:
def generate_text(model, start_string, temperature=1):
    # Evaluation step (generating text using the learned model)

    # Number of characters to generate
    num_generate = 1000

    # Converting our start string to numbers (vectorizing) 
    input_eval = [char2idx[s] for s in start_string]
    input_eval = tf.expand_dims(input_eval, 0)

    # Empty string to store our results
    text_generated = []

    # Low temperatures results in more predictable text.
    # Higher temperatures results in more surprising text.
    # Experiment to find the best setting.
    temperature = temperature

    # Here batch size == 1
    model.reset_states()
    for i in range(num_generate):
        predictions = model(input_eval)
        # remove the batch dimension
        predictions = tf.squeeze(predictions, 0)

        # using a multinomial distribution to predict the word returned by the model
        predictions = predictions / temperature
        predicted_id = tf.multinomial(predictions, num_samples=1)[-1,0].numpy()

        # We pass the predicted word as the next input to the model
        # along with the previous hidden state
        input_eval = tf.expand_dims([predicted_id], 0)

        text_generated.append(idx2char[predicted_id])

    return (start_string + ''.join(text_generated))

### Some examples from the Char-RNN

In [54]:
print(generate_text(model, start_string=u"\""))

" than the one less t leader requirements of the passion you now quite pleasure in prioritics, great ends of the cold, than think." - Colos S. Gabros
"It takes 20 you learn something great work flow you back, and obstronges is a language and has been thcackled by people for the youngency of an equal time that portips first lock the word. But how hard you will find anything you want."
"To announce is the only cum."  George DuDai."-- unknown
"The longer book thing if in another person is the greater"  Matstain Drammos
"Remember need." - Albert Camus
"They make peace with goodn love."-Phil Mamains
"Do what you hoped; far, man, assimistacally to fool today..." - Marcat dinner
" Time we called the meashing complex that bushes what happens to live up the 'realize that I'm to their laugh with you. You will treat you that cruelos Earth is saire, it meanus not what it can be ideased by knowing what the unimportant partic humanity, all a mind has learned your own not working hard to be di

In [55]:
print(generate_text(model, start_string=u"\""))

"S Christian The Leaker (2019)
"Inspiration is we need a quality, but not a miser. We forget them, and even pushed at difficult, fol showlined from say "Never to control that people who are destroyed."  Carl Safavell
"There is nothing meaning because of those who lack down and attempt." - Friedrich Nietzsche
"Live as if you will dare man of violence buy they have human." - Charles Bukowski
"On my langus of nechose. The economy, the needs of people are always unexplorational the only thing that changes you,"
"Success is not final" in nothing, but a man lies instruction will be defensed to keep going." - Nietzscho
" Badey is myself, for every Soly in 1988 - Bring Garbare Fox
"There's never losely as human actual a shit, Grny Andress not in the middle the Man in renk now and the keapons of the putts." - Tim Leving
"A child inside her layer flows and over the theese Catch
"There are ways to fail decided by thesting saves itself."  Denzus Albert Buron.
"Ohe day is not igntalful, t

In [56]:
print(generate_text(model, start_string=u"\""))

"It's the neurder'' - Unknown
"I can think, be a boy, remind me two things, over can be not prevented to taught to the person, but friends, 'Nature has no dicklish, and when we love is unner
"Hard work will come to much free pun out of your life."
"We all all the existence of ketch us, Harvelled Holstoy life med men to chase becomes all yourself" - Randyckon Brown
"To learn from a distarder adong it but not ceasantly disminders"The future is the only man who's missing phrout of sane." Perry De, Apology's hands..."he ever. The only thing had better be pushed, because those who matter hope his infinity in all slaw. And it face lies in the guy through the fires who admirts itself." 
"poursely"
"The grave, freedom should learn from others; Is there bett you fit, inside yourself, frequentive the family lines."
"The life we have to desert moments on fiftere people, but we making their death."- Arthur Wilsing and Your Mandon"t look not from the chains from reality." - Jacobies of Dalis

In [57]:
print(generate_text(model, start_string=u"\""))

" Territus -inasturity of the Dune." Arnn Sah Hor theands it has not half feether, but every time and walker when we can exploid that is that when the debilas between joy, and it leads to me is through, neithink Skywarts  Ryamivaved for themselves"-Addicis Cowers
"Sluderichism is the else in returnation." - Gilberto Russia (Cocute of True, unless Ice one estable" -Angelis Brand
"Good books are hard above groundarns of the bones of the United Bg sont." - Ravi Silari
"And everything is crush, what is painting, let someone told me. But there'll of nothing makes just throws rules break and said that." Earl Mison
"Has ever really got to the truth." - Martily Black Faulknon, Undept of teches of from tramplies all the trulded person, or power but on that." -- Mark Twained
"Posifical is] grass 10 years of the people. Then there would be to find ug little far more Russia, while step believin's the other sinnor removes a things." - Michael Joh sant women to discover than a great time." - Ma

One thing that is immediately clear is that the generally unformatted nature of the text is showing here. The network is likely attempting to find sequential significance in the string of quotes AND then attempting to fit even the incorrectly formatted quotes (which causes issues as there's no clear separation between quote and person). We can see this observation occasionally in the generated text.

In [58]:
NUM_WRITES = 100

# And we write some of these outputs to a text file out of interest.
with open('LSTMOutput.txt','w') as outFile:
    for _ in range(NUM_WRITES):
        outFile.write(generate_text(model, start_string=u"\""))
        outFile.write('\n\n') # double space to mark end of certain texts.