# Speech Maker

This notebook uses the [markovify](https://github.com/jsvine/markovify) library to generate Markov chains from a [dataset](https://floydhub.com/whatrocks/datasets/commencement) of popular commencement speeches. Among other things, this simple, extensible library is used by [many](https://twitter.com/MarkovPicard), [many](https://twitter.com/mashomatic) Twitter bots.

### Part 1: Entire Dataset

Let's first generate some sentences from our entire dataset of speeches.

In [None]:
import os
import markovify

SPEECH_PATH = '/floyd/input/speeches/'

speech_dict = {}
for speech_file in os.listdir(SPEECH_PATH):
    with open(f'{SPEECH_PATH}{speech_file}') as speech:
        contents = speech.read()
        # Create a Markov model for each speech in our dataset
        model = markovify.Text(contents)
        speech_dict[speech_file] = model

In [None]:
models = list(speech_dict.values())
print(f'There are {len(models)} speeches in our dataset.')

In [None]:
# Combine the Markov models
model_combination = markovify.combine(models)

In [None]:
# Generate 3 sentences
for i in range(3):
    print(f'{i}: {model_combination.make_sentence()}\n')

### Part 2: Top 10 Schools by Speech Count

Let's be elitist for a moment, shall we? Let's take a look at the top ten schools by speech count, and generate some sentences from these paragons of higher education.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

df = pd.read_csv('speech_metadata.csv')

# Top 10 schools by speech count
top_ten_schools = df.school.value_counts().head(10)
top_ten_schools.plot.bar()

# Make it a Python list
top_ten_schools_list = top_ten_schools.index.tolist()

# Make list of all speech filenames for these top 10 schools
filtered_speech_filenames = []
for index, row in df.iterrows():
    if row['school'] in top_ten_schools_list:
        filtered_speech_filenames.append(row['filename'])

# Filter our { filename: markov_model } dict by these filenames
filtered_speech_dict = { speech_file: speech_dict[speech_file] for speech_file in filtered_speech_filenames }

In [None]:
filtered_models = list(filtered_speech_dict.values())
print(f'Using {len(filtered_models)} speeches!\n')

# Combine these models and print some new sentences
filtered_model_combination = markovify.combine(filtered_models)
for i in range(3):
    print(f'{i}: {filtered_model_combination.make_sentence()}\n')

### Part 3: One school at a time

In this last section, let's play with a super fun [Jupyter widget extension](http://ipywidgets.readthedocs.io/en/latest/index.html) that lets us filter the speeches by each of the top ten schools. 

Why would you want to do this? Well, sometimes you really just want to generate a sentence from commencement speeches given at Stanford University. Or was it MIT? That's entirely up to you!

In [None]:
# Actually, let's try these schools one at a time!
import ipywidgets as widgets

toggle = widgets.ToggleButtons(
    options=top_ten_schools_list,
    description='Schools: ',
    disabled=False,
    button_style='',
    tooltips=[]
)
display(toggle)

In [None]:
# Make list of all speech filenames for selected school
filtered_speech_filenames = []
for index, row in df.iterrows():
    if row['school'] == toggle.value:
        filtered_speech_filenames.append(row['filename'])

# Filter our { filename: markov_model } dict by these filenames
filtered_speech_dict = { speech_file: speech_dict[speech_file] for speech_file in filtered_speech_filenames }

filtered_models = list(filtered_speech_dict.values())
print(f'There are {len(filtered_models)} speeches from {toggle.value}!\n')

# Combine these models and print some new sentences
filtered_model_combination = markovify.combine(filtered_models)
for i in range(3):
    print(f'{i}: {filtered_model_combination.make_sentence()}\n')