# Project: Literature Analysis

### Reading is great. And with so many amazing books out there also come great movies, reviews, and summaries. Reading those reviews and watching those films often only gives us a picture of what the book is actually like, though. With the power of data science and natural language processing, I am able to bring another dimension to how we understand literature.

For this project, I am looking at the following eight writings:
* **The Foundation by Isaac Asimov** - a book I am currently reading, by my favorite sci-fi writer 
* **A Clockwork Orange by Anthony Burgess** - the writing behind a famous extravagant horror movie by Stanley Kubrik, a book with a unique writing style and vocabulary
* **Comments to the Society of the Spectacle by Guy Debord** - a continuation of a book I was taught in university about the influence of the capitalist media on the society
* **A Brief History of Time by Stephen Hawking** - a book that excited millions about the workings of our universe
* **For Whom the Bell Tolls by Ernest Hemingway** - a writing with a unique writing style and themes specific to American writers
* **Carrie by Stephen King** - one of the most well-known horrors out there
* **The Hobbit by J.R.R. Tolkien** - a very long journey by very short people, one that so many people and communities hold dear to their heart
* **Slaughterhouse Five by Kurt Vonnegut** - a book highly recommended to me

# Text Generation

Here, we will use Markov chains for very basic text generation. 

We will make the basic assumption of Markov chains - that the next word is only dependent on the previous word.

*For this task with Markov Chains, I chose to use the defaultdict library because it allows to efficiently create a dictionary chain of words from our corpus on the go (in a loop), avoiding KeyErrors*

1. Select text to imitate


2. **Create a Markov Chain function** and apply it to the author of your choice
    - Import defaultdict library from collections
    - Input a string of text
    - Tokenize the text by word, but include punctuation
    - Initialize a default dictionary to hold all of the words and their connections
    - Create a zipped list of all of the word pairs and put them in a dictionary format (word : list of connecting words)
    - Convert the default dictionary back into a regular dictionary and return it

    
    
3. Create a **Sentence Generator function**
    - Input the chosen author's Markov Chain and desired length of sentence
    - Capitalize the first word to create a more sentence-like formatting
    - Follow the Markov Chain text generation method by generating the second word from the value list, then third, and repeating until the end of sentence 
    - End with a period to complete the sentence formatting


4. **Generate sentences** based on the writer of your choice

## Select text to imitate

In [None]:
# Read in the corpus, including punctuation!
import pandas as pd

data = pd.read_pickle('corpus.pkl')
data

Now you will choose the writer to imitate by copying in data.writing.loc['name_of_author'] and later pasting it into Step 1 shown below

## Build a Markov Chain Function

We are going to build a simple Markov chain function that creates a dictionary:
* The keys should be all of the words in the corpus
* The values should be a list of the words that follow the keys

In [None]:
from collections import defaultdict

def markov_chain(text):
    '''The input is a string of text and the output will be a dictionary with each word as
       a key and each value as the list of words that come after the key in the text.'''
    
    # Tokenize the text by word, though including punctuation
    words = text.split(' ')
    
    # Initialize a default dictionary to hold all of the words and next words
    m_dict = defaultdict(list)
    
    # Create a zipped list of all of the word pairs and put them in word: list of next words format
    for current_word, next_word in zip(words[0:-1], words[1:]):
        m_dict[current_word].append(next_word)

    # Convert the default dict back into a dictionary
    m_dict = dict(m_dict)
    return m_dict

## Create a Text Generator

We're going to create a function that generates sentences. It will take two things as inputs:
* The dictionary we just created
* The number of words we want generated

Here are some of my favorite examples of generated sentences:

> Stephen King: 'Comment. Nobody was overcast and sanitary napkins, chanting, laughing, shrieking of course, forbade her heart..'

> Guy Debord: 'Erasure of the century language of goods is insignificant or in this mystery is genuine.'

> Stephen Hawking: 'Suggests, cosmic censorship hypothesis tells us the inner edge of one ignored the very close.'

> J.R.R. Tolkien: 'Loosened his door, and giving his ring of the balance as if he had baked.'

> J.R.R. Tolkien: 'Grandmother, teaching his legs with them hoping against the chill flame, beating wigs bear to.'
    
> J.R.R. Tolkien: 'Saucer, and Bilbo had lost the foot of the foot right away.'
    
> J.R.R. Tolkien: '"Try," said a gathering together! There was not forbear to sniff.'
    
> J.R.R. Tolkien: 'What about the hobbit? He slashed the current market value'

> Isaac Asimov: 'Sake, as to admit you speak, Hardin. And two; after Emperor would one of Imperial.'

> Ernest Hemingway: 'Material and headed toward Segovia at the cup out and the gipsy’s voice thickening.'

> Kurt Vonnegut: 'Architect. The atmosphere now, big kiss. She had been stolen from the opinion of his.'

In [1]:
import random

def generate_sentence(chain, count=15):
    '''Input a dictionary in the format of key = current word, value = list of next words
       along with the number of words you would like to see in your generated sentence.'''

    # Capitalize the first word
    word1 = random.choice(list(chain.keys())) # pick the first word from a list of words in our dictionary
    sentence = word1.capitalize() # capitalize

    # Generate the second word from the value list. Set the new word as the first word. Repeat.
    for i in range(count-1): 
        word2 = random.choice(chain[word1]) # for word2, choose a random word from the list of next words from word1
        word1 = word2 # update the word1 to continue the loop
        sentence += ' ' + word2 # add a space between the words

    # End it with a period
    sentence += '.'
    return(sentence)

## Steps for generating sentences from an author

**Step 1** - Select the text to imitate (data.writing.loc['name_of_author'])


**Step 2** - Create the markov chain dictionary from selected text markov_chain(data.writing.loc['name_of_author'])


**Step 3** - Generate a sentence using the markov chain dictionary you created generate_sentence(markov_chain(data.writing.loc['name_of_author']))

# Isaac Asimov

In [None]:
# Step 1 - most inside - Select the text to imitate
# Step 2 - Create the markov chain dictionary from the selected text
# step 3 - most outside - Generate a sentence  

generate_sentence(markov_chain(data.writing.loc['asimov']))

# J.R.R. Tolkien

In [None]:
generate_sentence(markov_chain(data.writing.loc['tolkien']))

# Guy Debord

In [None]:
generate_sentence(markov_chain(data.writing.loc['debord']))

# Anthony Burgess

In [None]:
generate_sentence(markov_chain(data.writing.loc['burgess']))

# Stephen Hawking

In [None]:
generate_sentence(markov_chain(data.writing.loc['hawking']))

# Ernest Hemingway

In [None]:
generate_sentence(markov_chain(data.writing.loc['hemingway']))

# Stephen King

In [None]:
generate_sentence(markov_chain(data.writing.loc['king']))

# Kurt Vonnegut

In [None]:
generate_sentence(markov_chain(data.writing.loc['vonnegut']))