# **Project:** Ariel

# **Step:** Compiler

**Goal:** Create a tool to help amateur screenwriters better identify their voice and improve (sell) their screenplays.

**Process:** Using NLP, create an engine that takes a pilot screenplay and returns "sister scripts." Through these scripts, writers can identify similarities in tone, topic, and target distribution channels for networks of "best fit." 

**Code and Concepty by:** Tyler Zencka

# Imports

In [3]:
import pandas as pd
import pickle
import numpy as np
import re
import string
import json
import os
import io

# Parsing
import sys
from pdfminer.pdfinterp import PDFResourceManager, PDFPageInterpreter
from pdfminer.converter import PDFPageAggregator
from pdfminer.layout import LTTextBoxHorizontal
from pdfminer.converter import HTMLConverter,TextConverter,XMLConverter
from pdfminer.layout import LAParams
from pdfminer.pdfpage import PDFPage

# Mapping/Display
%matplotlib inline
from matplotlib import pyplot as plt
pd.set_option('display.max_columns', 500)
pd.set_option('display.max_rows', 500)
pd.set_option('display.float_format', lambda x: '%.3f' % x)

# logging for gensim (set to INFO)
import logging
logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s', level=logging.INFO)
from gensim import corpora, models, similarities, matutils

# Load the parser module

Before loading the modules, which can find in the project GitHub, you may want to check your sytem path.

In [None]:
# Double check what system your notebook is running
sys.executable

In [None]:
# Double check where it's pulling imports from
sys.path

In [None]:
import parser.parse_pdf as p

# Parsing PDFs

Most scripts online can be found in pdf form, so our first task is to build an engine that can take a *PDF Script* as an input and turn it into text.

## Show Lists

After scraping screenplay PDFs from the internet, store them in a folder on your local drive.
Include all titles in a list here, which we will put through our parser to mass parse collections of scripts.

Note: These *titles_list* will run through the compiler one at a time. So our steps are:
1. Run BETA through to the end of the Notebook.
2. When above runs well, run BATCH1 through end of the Notebook.
3. Run BATCH2 through end of the Notebook.
4. Run BATCH3 through end of the Notebook.
5. Scrape for more screenplay PDFs online and create news titles_lists.

Since these are pulling PDFs from my local drive, you'll want to download the collection from GitHub.

### ***BETA***

In [None]:
# This is our BETA group, a small collection to test out functions in a computationally inexpensive way.

titles_list = ['Beauty_and_the_Beast_1x01_-_Pilot_(Steinberg).pdf',
'Happy!_1x01_-_Pilot.pdf',
'Riverdale_1x01_-_The_Rivers_Edge.pdf',
'Sweet_Vicious_1x01_-_Pilot.pdf',
'The_Secret_Lives_Of_Husbands_And_Wives_1x01_-_Pilot.pdf',
'Warriors_1x01_-_Pilot.pdf']

### ***BATCH1***

In [None]:
# This is our first larger test sample of inputs to get an MVP version of our product.

titles_list = ['Brockmire_1x01_-_Pilot.pdf',
'Brooklyn_Nine-Nine_1x01_-_Pilot_(Mar_19_2013).pdf',
'Californication_1x01_-_Pilot.pdf',
'Casual_1x01_-_Pilot.pdf',
'Catastrophe_1x01_-_Pilot.pdf',
'Catch-22_1x01_-_Pilot.pdf',
'Champaign_ILL_1x01_-_Pilot.pdf',
'Community_1x01_-_Pilot.pdf',
'Cougar_Town_1x01_-_Pilot.pdf',
'Counterpart_1x01_-_Pilot.pdf',
'Crazy_Ex-Girlfriend_1x01_-_Pilot.pdf',
'Dark_Skies_1x01_-_Awakening.pdf',
'Defenders_1x01_-_Pilot.pdf',
'Dexter_1x01_-_Pilot.pdf',
'Dirk_Gently_1x01_-_Pilot.pdf',
'Doctor_Who_001_-_An_Unearthly_Child.pdf',
'Drop_Dead_Diva_1x01_-_Pilot.pdf',
'Empire_1x01_-_Pilot.pdf',
'Eleventh_Hour_1x01_-_Man_Without_a_Shadow.pdf',
'Emerald_City_1x01_-_Pilot.pdf',
'Emily_Owens,_MD_1x01_-_Pilot.pdf',
'Empire_State_1x01_-_Pilot.pdf',
'Episodes_1x01_-_Pilot.pdf',
'Escape_at_Dannemora_1x01_-_Chapter_One.pdf',
'Everything_Sucks_1x01_-_Pilot.pdf',
'Everythings_Gonna_Be_OK_1x01_-_Pilot.pdf',
'Everwood_1x01_-_Pilot.pdf',
'Exile_ep1.pdf',
'Fargo_1x01_-_Pilot.pdf',
'Fear_The_Walking_Dead_1x01_-_Pilot.pdf',
'Football_Wives_1x01_-_Pilot.pdf',
'Franklin_&_Bash_1x01_-_Bro-Bono.pdf',
'Future_Man_1x01_-_Pilot.PDF',
'Gilmore_Girls_1x01_-_Pilot.pdf',
'Godless.pdf',
'Gossip_Girl_1x01_-Pilot.pdf',
'Gotham_1x01_-_Pilot.pdf',
'Grace_and_Frankie_1x01_-_The_End.pdf',
'Great_News_1x01_-_Pilot.pdf',
'Halt_&_Catch_Fire_1x01_-_Breaking_Big_Blue.pdf',
'Happy_Valley_1x01.pdf',
'Highston_1x01_-_Pilot.pdf',
'How_To_Get_Away_With_Murder_1x01_-_Pilot.pdf',
'Humans_1x01_-_Pilot.pdf',
'iZombie_1x01_-_Pilot.pdf',
'Jack_Ryan_1x01_-_Pilot.pdf',
'Jane_the_Virgin_1x01_-_Pilot.pdf',
'Jericho_1x01_-_Pilot.pdf',
'Jon_Lovitz_Show_1x01_-_Pilot.pdf',
'Justified_1x01_-_Pilot.pdf',
'Last_Man_on_Earth_1x01_-_Pilot_(As_Broadcast).pdf',
'Life_In_Pieces_1x01_-_Pilot.pdf',
'Life_on_Mars_1x01.pdf',
'Locke_And_Key_1x01_-_Ghost_Key.pdf',
'Longmire_1x01_-_Pilot.pdf',
'Lost_1x01_-_Pilot.pdf',
'Magicians_1x01_-_Unauthorized_Magic.pdf',
'Man_Seeking_Woman_1x01_-_Pilot.pdf',
'Masters_Of_Sex_1x01_-_Pilot.pdf',
'Mozart_in_the_Jungle_1x01_-_Pilot.pdf',
'Narcos_1x01_-_Descenso.pdf',
'NCIS_6x22_-_Legend_pt1.pdf',
'One_Mississippi_1x01_-_Pilot.pdf',
'Ozark_1x01_-_Pilot.pdf',
'Party_Down_1x01_-_Pilot.pdf',
'Peaky_Blinders_1x01.pdf',
'Political_Animals_1x01_-_Pilot.pdf',
'Pretty_Little_Liars_1x01_-_Pilot.pdf',
'Prison_Break_2016_1x01_-_Pilot.pdf',
'Ray_Donovan_1x01_-_Pilot.pdf',
'Revolution_1x01_-_Pilot.pdf',
'Rizzoli_and_Isles_1x01_-_Pilot.pdf',
'Royal_Pains_1x01_-_Pilot.pdf',
'Ryan_Hansen_Solves_Crimes_On_Television_1x01_-_Pilot.pdf',
'Samantha_Who_1x01_-_Pilot.pdf',
'Santa_Clarita_Diet_1x01_-_Pilot.pdf',
'Smallville_1x01_-_Pilot.pdf',
'Sons_of_Anarchy_1x01_-_Pilot.pdf',
'Spellbound_1x01_-_Pilot.pdf',
'Star_Trek_-_The_Next_Generation_1x01-102_-_Encounter_at_Farpoint.pdf',
'Stranger_Things_1x01_-_Pilot.pdf',
'Surviving_Jack_1x01_-_Pilot.pdf',
'Taboo_1x01_-_Shovels_and_Keys.pdf',
'The_Americans_1x01_-_Pilot.pdf',
'The_Big_Bang_Theory_1x01_-_Pilot.pdf',
'The_Blacklist_1x01_-_Pilot.pdf',
'The_Chilling_Adventures_Of_Sabrina_1x01_-_October_Country.pdf',
'The_Good_Wife_1x01_-_Pilot.pdf',
'The_Killing_1x01_-_Pilot.pdf',
'The_Last_Ship_1x01_-_Pilot.pdf',
'The_Real_ONeils_1x01_-_Pilot.pdf',
'The_Sinner_1x01_-_Pilot.pdf',
'The_Strain_1x01_-_Night_Zero.pdf',
'Tyrant_1x01_-_Pilot.pdf',
'Unbreakable_Kimmy_Schmidt_1x01_-_Pilot.pdf',
'Vampire_Diaries_1x01_-_Pilot.pdf',
'Wayward_Pines_1x01_-_Pilot.pdf',
'Witches_of_East_End_1x01_-_Pilot.pdf',
'Wrecked_1x01_-_Pilot.pdf',
'Yellowstone_1x01_-_Pilot.pdf',]

### ***BATCH2***

In [None]:
# This is our second larger test sample of inputs to get an MVP version of our product.

titles_list = ['Castle_1x01_-_Chapter_One.pdf',
'Charlies_Angels_1x01_-_Pilot.pdf',
'Charmed_1x01_-_Pilot.pdf',
'Cheerleader_Death_Squad_1x01_-_Pilot.pdf',
'Chuck_1x01_-_Pilot.pdf',
'Constantine_1x01_-_Pilot.pdf',
'Criminal_Minds_Suspect_Behavior_1x01_-_Pilot.pdf',
'Cruel_Intentions_1x01_-_Pilot.pdf',
'Dallas_1x01_-_Changing_Of_The_Guard.pdf',
'Dark_Matter_1x01.pdf',
'Designated_Survivor_1x01_-_Pilot.pdf',
'Dietland_1x01_-_Pilot.pdf',
'Dirt_1x01_-_Pilot.pdf',
'Dirty_Sexy_Money_(aka_The_Darlings)_1x01_Pilot.pdf',
'Divorce_-_A_Love_Story_1x01_-_Pilot.pdf',
'Dr._Ken_1x01_-_Pilot.pdf',
'Feud_1x01_-_Pilot.pdf',
'Flashpoint_1x01.pdf',
'Friends_With_Benefits_1x01_-_Pilot.pdf',
'Gaffigan_1x01_-_Pilot.pdf',
'Ghosted_1x01_-_Pilot.pdf',
'Girlboss_1x01_-_Pilot.pdf',
'Girlfriends_Guide_To_Divorce_1x01_-_Pilot.pdf',
'Glow_1x01_-_Pilot.pdf',
'Goliath_1x01_-_Pilot.pdf',
'Good_Girls_1x01_-_Pilot.pdf',
'Grandfathered_1x01_-_Pilot.pdf',
'Hand_Of_God_1x01_-_Pilot.pdf',
'Hannibal_1x01_-_Pilot.pdf',
'Happy_Endings_1x01_-_Pilot.pdf',
'Happyish_1x01_-_Starring_Samuel_Beckett,_Albert_Camus_and_Alois_Alzheimer.pdf',
'Harlots_1x01_-_Pilot.pdf',
'Hart_Of_Dixie_1x01_-_Pilot.pdf',
'Hell_on_Wheels_1x01_-_Pilot.pdf',
'Hot_In_Cleveland_1x01_-_Dead_Is_The_New_90.pdf',
'How_I_Met_Your_Dad_1x01_-_Pilot.pdf',
'I_Love_Dick_1x01_-_Pilot.pdf',
'Impastor_1x01_-_Pilot.pdf',
'Jean-Claude_Van_Johnson_1x01_-_Pilot.pdf',
'Kevin_(Probably)_Saves_the_World_1x01_-_Pilot.pdf',
'Killing_Eve_1x01_-_Pilot.pdf',
'Killjoys_1x01_-_Bangarang.pdf',
'Kingdom_1x01_-_Pilot.pdf',
'Lady_Dynamite_1x01_-_Pilot.pdf',
'Last_Man_Standing_1x01_-_Pilot.pdf',
'Limitless_1x01_-_Pilot.pdf',
'Line_of_Duty_Episode_1.pdf',
'Living_Biblically_1x01_-_Pilot.pdf',
'Love_1x01_-_Pilot.pdf',
'Lucifer_1x01_-_Pilot.pdf',
'Luther_1x01_-_Pilot.pdf',
'Madam_Secretary_1x01_-_Pilot.pdf',
'MadMenPilot.pdf',
'Manhunt_-_Unabomber_1x01_-_Pilot.pdf',
'Melissa_and_Joey_1x01_-_Pilot.pdf',
'Merlin_1x01_-_Pilot.pdf',
'Merry_Happy_Whatever_1x01_-_December_21_-_Welcome_Matt.pdf',
'Mike_Berbiglias_Secret_Public_Journal_1x01.pdf',
'Minority_Report_1x01_-_Pilot.pdf',
'Mixology_1x01_-_Pilot.pdf',
'Mob_City_1x01_-_Pilot.pdf',
'Mr_Robinson_1x01_-_Pilot.pdf',
'Mr_Robot_1x01_-_Pilot.pdf',
'Mr._Sunshine_1x01_-_Pilot.pdf',
'Mulaney_1x01_-_Pilot.pdf',
'New_Girl_1x01_-_Pilot.pdf',
'No_Ordinary_Family_1x01_-_Pilot.pdf',
'Once_Upon_A_Time_1x01_-_Pilot.pdf',
'One_Big_Happy_1x01_-_Pilot.pdf',
'OrangeIsTheNewBlack.pdf',
'Orphan_Black_1x01_-_Pilot.pdf',
'Orville_1x01_-_Pilot.pdf',
'Our_Town_1x01_-_Pilot.pdf',
'Outlander_1x01_-_Sassenach.pdf',
'Parenthood_1x01_-_Pilot.pdf',
'Pen15_1x01_-_First_Day.pdf',
'Phys_Ed_1x01_-_Pilot.pdf',
'Pitch_1x01_-_Pilot.pdf',
'Pose_1x01_-_A_House_is_Not_a_Home.PDF',
'Pushing_Daisies_1x01_-_Pilot.pdf',
'Quantico_1x01_-_Pilot.pdf',
'Rectify_1x01_-_Pilot_1x01.pdf',
'Red_Band_Society_1x01_-_Pilot.pdf',
'Red_Oaks_1x01_-_Pilot.pdf',
'Rocky_Horror_Picture_Show_Lets_Do_The_Timewarp_Again.pdf',
'Russian_Doll_1x01_-_Nothing_in_this_World_is_Easy.pdf',
'Scandal_1x01_-_Pilot.pdf',
'Schooled_1x01_-_Pilot.pdf',
'Sean_Saves_The_World_1x01_-_Pilot.pdf',
'Selfie_1x01_-_Pilot.pdf',
'Shameless_1x01_-_Pilot.pdf',
'Shit_My_Dad_Says_1x01_-_Pilot.pdf',
'Shooter_1x01_-_Pilot.pdf',
'Sleepy_Hollow_1x01_-_Come_and_See.pdf',
'Sneaky_Pete_1x01_-_Pilot.pdf',
'Son_of_Zorn_1x01_-_Pilot.pdf',
'Sons_Of_Tucson_1x01_-_Pilot.pdf',
'Sorry_for_Your_Loss_1x01_-_Pilot.pdf',
'Southland_1x01_-_Pilot.pdf',
'Stan_Against_Evil_1x01_-_Eccles_and_the_172.pdf',
'Star_Crossed_1x01_-_Pilot.pdf',
'Suits_1x01_-_Pilot.pdf',
'Terriers_1x01_-_Pilot.pdf',
'The_100_1x01_-_Pilot.pdf',
'The_Affair_1x01_-_Pilot.pdf',
'The_Boys_1x01_-_The_Name_of_the_Game.pdf',
'The_C_Word_1x01_-_Pilot.pdf',
'The_Client_List_1x01_-_Pilot.pdf',
'The_Comedians_1x01_-_Pilot.pdf',
'The_Crown_1x01_-_Pilot.pdf',
'The_Cure_1x01_-_Pilot.pdf',
'The_Expanse_1x01_-_Pilot.pdf',
'The_Following_1x01_-_Pilot.pdf',
'The_Fosters_1x01_-_Pilot.pdf',
'The_Glades_1x01_-_Pilot.pdf',
'The_Good_Doctor_1x01_-_Pilot.pdf',
'The_Handmaids_Tale.pdf',
'The_Hatfields_and_Mccoys_1x01_-_Pilot.pdf',
'The_Knick_1x01_-_For_Headaches_and_Exhaustion.pdf',
'The_Last_O.G_1x01_-_Pilot.pdf',
'The_Librarians_1x01_-_And_The_Crown_of_King_Arthur.pdf',
'The_Lizzie_Borden_Chronicles_1x01.pdf',
'The_Lost_Girls_1x01_-_Pilot.pdf',
'The_Man_in_the_High_Castle_1x01_-_Pilot.pdf',
'The_Marvelous_Mrs_Maisel_1x01_-_Pilot.pdf',
'The_Mick_1x01_-_Pilot.pdf',
'The_Middle_1x01_-_Pilot.pdf',
'The_Millers_1x01_-_Pilot.pdf',
'The_Mindy_Project_1x01_-_Pilot.pdf',
'The_Mist_1x01_-_Pilot.pdf',
'The_Mob_Doctor_1x01_-_Pilot.pdf',
'The_Mysteries_Of_Laura_1x01_-_Pilot.pdf',
'The_Night_Manager_1x01.pdf',
'The_OA_1x01_-_Pilot.pdf',
'The_Odd_Couple_1x01_-_Pilot.pdf',
'The_Resident_1x01_-_Pilot.pdf',
'The_Royals_1x01_-_Pilot.pdf',
'The_Slap_1x01_-_Hector.pdf',
'The_Terror_1x01_-_Go_For_Broke.pdf',
'The_Tick_1x01_-_Pilot_(2000).pdf',
'The_Tomorrow_People_1x01_-_Pilot.pdf',
'The_Tower_1x01.pdf',
'The_Vikings_1x01_Pilot.pdf',
'The_Watch__1x01_-_Pilot.pdf',
'ThisIsUs.pdf',
'Transparent_1x01_-_Pilot.pdf',
'Travellers_1x01_-_Pilot.pdf',
'Turn_1x01_-_Pilot.pdf',
'Tut_1x01_-_Choice.pdf',
'Uncle_Buck_1x01_-_Pilot.pdf',
'Underemployed_1x01_-_Pilot.pdf',
'United_States_of_Tara,_The_1x01_-_Pilot.pdf',
'UnReal_1x01_-_Return.pdf',
'Up_All_Night_1x01_-_Pilot.pdf',
'We_Are_Men_1x01_-_Pilot.pdf',
'Weekends_at_Bellevue_1x01_-_Pilot.pdf',
'White_Famous_1x01_-_Pilot.pdf',
'Whitney_1x01_-_Pilot.pdf',
'Workaholics_1x01_-_Piss_&_S__t.pdf',
'You_Me_and_the_Apocalypse_1x01_-_Pilot.pdf',
'Young_Sheldon_1x01_-_Pilot.pdf',
'Youre_the_Worst_1x01_-_Pilot.pdf']

### **BATCH3**

In [None]:
# This is our 3rd large batch of scripts.

titles_list = ['Chambers_1x01_-_Into_the_Void.pdf',
'All_Rise_1x01_-_Pilot.pdf',
'American_Princess_1x01_-_Pilot-1.pdf',
'Awkwafina_1x01_-_Pilot.pdf',
'Blood__Treasure_1x01_-_The_Curse_of_Cleopatra.pdf',
'Bluff_City_Law_1x01_-_Pilot.pdf',
'Briarpatch_1x01_-_Breadknife_Weather.pdf',
'Carnival_Row_1x01_-_Pilot.pdf',
'Catch-22_1x01_-_Pilot.pdf',
'Chernobyl_Episode-11_23_45.pdf',
'City_On_A_Hill_1x01_-_The_Night_Flynn_Sent_the_Cops_on_the_Ice.pdf',
'Defending_Jacob_1x01_-_Pilot.pdf',
'Deputy_1x01_-_Graduation_Day.pdf',
'Devs_1x01_-_Pilot.pdf',
'Dispatches_From_Elsewhere_1x01_-_Peter.pdf',
'Dollface_1x01_-_Pilot.pdf',
'Emergence_1x01_-_Pilot.pdf',
'Escape_at_Dannemora_1x01_-_Chapter_One.pdf',
'EUPHORIA-PILOT.pdf',
'Everythings_Gonna_Be_OK_1x01_-_Pilot.pdf',
'Evil_1x01_-_Pilot.pdf',
'First_Wives_Club_1x01_-_Pilot.pdf',
'Freaks_And_Geeks_1x01_-_Pilot-1.pdf',
'good-omens-101-in-the-beginning-2019.pdf',
'Grand_Hotel_1x01_-_Pilot.pdf',
'Hot_Zone_1x01_-_053.pdf',
'I_Am_the_Night_1x01_-_Pilot.pdf',
'In_Between_Lives_1x01_-_Pilot.pdf',
'In_The_Dark_1x01_-_Pilot.pdf',
'Looking_For_Alaska_1x01_-_Famous_Last_Words.pdf',
'Miracle_Workers_1x01_-_Pilot.pdf',
'Nancy_1x01_-_In_Dreams_Begin_Responsibility.pdf',
'Never_Have_I_Ever_1x01_-_Pilot.pdf',
'NOS4A2_1x01_-_The_Shorter_Way.pdf',
'On_Becoming_A_God_In_Central_Florida_1x01_-_Pilot.pdf',
'Paradise_Lost_1x01_-_Pilot.pdf',
'Perfect_Harmony_1x01_-_Hallelujah.pdf',
'Pretty_Little_Liars_-_The_Perfectionists_1x01_-_Pilot.pdf',
'Prodigal_Son_1x01_-_Pilot.pdf',
'Project_Blue_Book_1x01_-_Pilot.pdf',
'Proven_Innocent_1x01_-_Pilot.pdf',
'Red_Line_1x01_-_Pilot.pdf',
'Roswell_New_Mexico_1x01_-_Pilot.pdf',
'Russian_Doll_1x01_-_Pilot.pdf',
'Tales_from_the_Loop_1x01_-_Loop.pdf',
'The_Boys_1x01_-_The_Name_of_the_Game.pdf',
'The_Enemy_Within_1x01_-_Pilot.pdf',
'The_Passage_1x01_-_Pilot.pdf',
'The_Rook_1x01_-_Pilot.pdf',
'The_Umbrella_Academy_1x01_-_We_Only_See_Each_Other_at_Weddings_and_Funerals.pdf',
'The_Unicorn_1x01_-_Pilot.pdf',
'The_Village_1x01_-_Pilot.pdf',
'Too_Old_to_Die_Young_1x01_-_Pilot.pdf',
'Twin_Peaks_1x01_-_Traces_to_Nowhere.pdf',
'Unbelievable_Story_Of_Rape_1x01_-_Pilot.pdf',
'Uninsured_1x01_-_Pilot.pdf',
'Unt_Hank_Steinberg_1x01_-_Pilot.pdf',
'Upload_1x01_-_Pilot.pdf',
'Whiskey_Cavalier_1x01_-_Pilot.pdf',
'Why_Women_Kill_1x01_-_Pilot.pdf',
'Zoeys_Extraordinary_Playlist_1x01_-_Pilot.pdf']

## Compiler

**Step:** This module is contained in the GitHub Project files.

**Process:** This Compiler turns PDFs into text and compiles them for further parsing.

*Note:* Depending on how many scripts, you're compiling, this may take some time. 100 scrips = ~10 min.

In [None]:
import parser.parse_pdf as p

screenplays = p.pathfinder(titles_list)

In [None]:
backup = screenplays

## Cleaning

Because screenplays have so many different ways of being formatted, there may be further cleaning required to make sure each entry in the dictionary matches a basic format.

The hashtaged items below are cleaning that can be performed on the exact titles above. For new lists of titles that you find on your own, you may need to do additional cleaning.

### TEST (cleaning)

In [None]:
# # WARRIORS
del screenplays[5][:1]
del screenplays[5][5:14]
del screenplays[5][15:33]
del screenplays[5][0]

# # THE SECRET LIVES OF HUSBANDS AND WIVES
del screenplays[4][:5]
del screenplays[4][9:20]

# # LITTLE DARLINS
del screenplays[3][:1]
screenplays[3].insert(0, "LITTLE DARLINGS")

# RIVERDALE
del screenplays[2][:3]
del screenplays[2][7:10]
screenplays[2].insert(0, "RIVERDALE")
del screenplays[2][5:7]
del screenplays[2][5]
screenplays[2].insert(5, "ACT ONE")

### BATCH1 (cleaning)

In [None]:
# Life on Mars
screenplays[52].insert(0, "Life On Mars")
screenplays[52].insert(1, "Written by")
screenplays[52].insert(2, "Matthew Graham, Tony Jordan, Ashley Pharoah")
# Political Animals
del screenplays[56][:28]
# Jon Lovitz
del screenplays[48][0]

In [None]:
# Surviving Jack
del screenplays[70][:2]
# Humans
del screenplays[43][:52]
# Grace and Frankie
del screenplays[37][:69]
# fear the walking dead
del screenplays[29][:1]
# doctor who
del screenplays[15][:1]
del screenplays[15][:2]
screenplays[15].insert(0, 'Doctor Who')
# last man on earth
del screenplays[50][:22]
# life on mars
del screenplays[52][0]
# Lost
screenplays[55].insert(0, 'Lost')
del screenplays[60][:11]
del screenplays[62][:2]
del screenplays[66][:28]

In [None]:
del screenplays[69][:9]
del screenplays[76][0]
screenplays[76].insert(0, "Smallville")
del screenplays[79][:24]
del screenplays[81][:2]
del screenplays[84][:27]
# The Defenders
screenplays[12].insert(4,"TEASER")
del screenplays[12][5]

### BATCH2 (cleaning)

In [None]:
del screenplays[151][:6]
del screenplays[153][0]
screenplays[153].insert(0, "UP ALL NIGHT")
del screenplays[154][0]
screenplays[154].insert(0, "WE ARE MEN")
del screenplays[156][0]
screenplays[156].insert(0, "WHITE FAMOUS")
screenplays[42].insert(0, "KINGDOM")
del screenplays[42][1:30]
del screenplays[62][:27]
del screenplays[68][:16]
del screenplays[69][0]
del screenplays[71][:6]
screenplays[71].insert(0, "ORVILLE")
del screenplays[75][:12]
del screenplays[76][0]
del screenplays[81][:3]
del screenplays[82][0]
del screenplays[87][0]
screenplays[86].insert(0, "SCANDAL")
del screenplays[87][0]
screenplays[87].insert(0, "SCHOOLED")
del screenplays[88][0]
screenplays[88].insert(0, "SEAN SAVES THE WORLD")
del screenplays[89][0:1]
screenplays[89].insert(0, "SELFIE")
screenplays[90].insert(0, "SHAMELESS")
del screenplays[91][0]
screenplays[91].insert(0, "SHIT MY DAD SAYS")
del screenplays[93][:21]
del screenplays[93][:1]
screenplays[93].insert(0, "SLEEPY HOLLOW")
del screenplays[94][0]
screenplays[94].insert(0, "SNEAKY PETE")
del screenplays[97][0]
screenplays[97].insert(0, "SORRY FOR YOUR LOSS")
del screenplays[98][0]
screenplays[98].insert(0, "SOUTHLAND")
del screenplays[99][0]
screenplays[99].insert(0, "STAN AGAINST EVIL")
del screenplays[100][0]
screenplays[100].insert(0, "STAR CROSSED")
del screenplays[101][0]
screenplays[101].insert(0, "SUITS")
del screenplays[108][:10]
del screenplays[110][0]
screenplays[110].insert(0, "THE CURE")
del screenplays[111][0]
screenplays[111].insert(0, "THE EXPANSE")
del screenplays[112][0]
screenplays[112].insert(0, "THE FOLLOWING")
del screenplays[102][:1]
screenplays[102].insert(0, "TERRIERS")
del screenplays[114][0]
screenplays[114].insert(0,"THE GLADES")
del screenplays[119][:1]
screenplays[119].insert(0, "THE REAL OG")
del screenplays[120][0]
screenplays[120].insert(0, "THE LIBRARIANS")
del screenplays[125][0]
screenplays[125].insert(0, "THE MICK")
del screenplays[124][0]
screenplays[124].insert(0, "THE MARVELOUS MS MAISEL")
screenplays[126].insert(0, "THE MIDDLE")
del screenplays[140][:1]
del screenplays[141][:24]
del screenplays[142][:26]
del screenplays[144][0]
screenplays[144].insert(0, "THIS IS US")
screenplays[147].insert(0, "TURN")
del screenplays[148][0]
screenplays[148].insert(0, "TUT")
del screenplays[149][0]
screenplays[149].insert(0, "TUT")
del screenplays[150][0]
screenplays[150].insert(0, "UNCLE BUCK")

In [None]:
# CHARMED
del screenplays[2][2]
#CHEERLEADER DEATH SQUAD
del screenplays[3][:8]
# YOUNG SHELDON
del screenplays[161][0]
screenplays[161].insert(0,'YOUNG SHELDON')
# YOU, ME, AND THE APOCALYPSE
del screenplays[160][0]
screenplays[160].insert(0,"YOU, ME, AND THE APOCALYPSE")
# CONSTANTINE
del screenplays[5][:39]
screenplays[5].insert(0,"CONSTANTINE")
# DIETLAND
del screenplays[11][:6]
screenplays[11].insert(1, "CREATED BY")
screenplays[11].insert(2, "Marty Noxon")
# The DARLINGS
# FEUD
del screenplays[16][:8]
del screenplays[22][:7]
del screenplays[22][:2]
screenplays[22].insert(0, "THE GIRLFRIENDS GUIDE TO DIVORCE")
del screenplays[24][:16]
screenplays[24].insert(0, "GOLIATH")
screenplays[30].insert(0, "HAPPYISH")
del screenplays[30][2:4]
del screenplays[34][:13]
screenplays[34].insert(0, "HOT IN CLEVELAND")
screenplays[37].insert(0, "IMPASTOR")
del screenplays[37][1:19]
del screenplays[37][1:4]
del screenplays[40][0]
screenplays[44].insert(0, "Last Man Standing")
screenplays[44].insert(1, "Written by")
screenplays[44].insert(2, "Jack Burditt")
del screenplays[44][3:44]
screenplays[46].insert(0, "LINE OF DUTY")
del screenplays[46][1:2]
del screenplays[49][0]
del screenplays[49][1:9]
del screenplays[54][:1]
screenplays[54].insert(0,"MELISSA AND JOEY")
screenplays[55].insert(0,"MERLIN")
del screenplays[55][1:14]
screenplays[56].insert(0, "MERRY HAPPY WHATEVER")
del screenplays[56][1]
# screenplays[57].insert(0, "MIKE BERBIGLIAS SECRET PUBLIC JOURNAL")
del screenplays[57][1:16]
del screenplays[63][:4]
screenplays[63].insert(0, "MISTER SUNSHINE")
screenplays[64].insert(0, "MULANEY")
screenplays[65].insert(0, "NEW GIRL")
screenplays[69].insert(0, "ONE BIG HAPPY")
del screenplays[6]
screenplays[82].insert(0, "RECTIFY")
del screenplays[87][0]

del screenplays[127][0]
screenplays[127].insert(0, "THE MILLERS")
del screenplays[129][0]
screenplays[129].insert(0, "THE MIST")
del screenplays[130][0]
screenplays[130].insert(0, "THE MOB DOCTOR")
del screenplays[131][0]
screenplays[131].insert(0, "THE MYSTERIES OF LAURA")
del screenplays[132][0]
screenplays[132].insert(0, "THE NIGHT MANAGER")

### BATCH3 (cleaning)

In [None]:
screenplays[4].insert(0, "BLOOD AND TREASURE")

# del screenplays[30][:14]
del screenplays[13][0]
screenplays[13].insert(0, "DEVS")
del screenplays[13][1:3]
screenplays[13].insert(1, "Written By Alex Garland")
del screenplays[24][:8]
del screenplays[26][:2]
del screenplays[30][1:3]
del screenplays[31][:2]
screenplays[31].insert(0, "NEVER HAVE I EVER")
del screenplays[36][:26]
screenplays[36].insert(0, "PRETTY LITTLE LIARS")
del screenplays[39][:6]
screenplays[39].insert(0, "PROJECT BLUE BOOK")
del screenplays[48][:2]
screenplays[48].insert(0, "THE UMBRELLA ACADEMY")
del screenplays[50][:12]
screenplays[50].insert(0, "THE VILLAGE")
del screenplays[53][0]
screenplays[53].insert(0, "UNBELIEVABLE")
del screenplays[55][0]
screenplays[55].insert(0, "FOR LIFE")
del screenplays[59][0]
screenplays[59].insert(0, "ZOEY'S EXTRAORDINARY PLAYLIST")

In [None]:
del screenplays[59][0]
screenplays[59].insert(0, "ZOEY'S EXTRAORDINARY PLAYLIST")

### **Nominal Dict**

**Step:** Send compiled scripts through this module to build a dictionary of specs from the cover page.

In [None]:
import parser.cull_cover_page as titles

In [None]:
nominal_dicts = titles.cull_cover_page(screenplays)

### **Counts Dict**

**Step:** Send compiled scripts through this module to attach the screenplay body in a new dictionary.

In [None]:
import parser.cull_script_body as scripts

In [None]:
counts_dicts = scripts.run_counts(screenplays)

### ***Combine_Dicts***

**Step:** Concatenate those two dictionaries together using the function below. This will results in a list of dictionaries.

In [None]:
def combine_dicts (dict1, dict2):
    pilots = []
    for index, line in enumerate(dict1):
#     for index, line in enumerate(dict2):
        new = {**dict1[index], **dict2[index]}
#         print (index) #check
        pilots.append(new)
    return pilots

screenplays = combine_dicts(nominal_dicts,counts_dicts )

In [None]:
len(screenplays), type(screenplays), type(screenplays[1])

# Pickling

Once the scripts are cleaned, feel free to pickle those lists of dictionaries so you don't have to do it again.

In [None]:
with open('BATCH1.pickle', 'wb') as write_file:
    pickle.dump(screenplays, write_file)

In [None]:
with open('BATCH2.pickle', 'wb') as write_file:
    pickle.dump(screenplays, write_file)

In [None]:
with open('BATCH3.pickle', 'wb') as write_file:
    pickle.dump(screenplays, write_file)

# Pickle Open

Now we can re-load those pickles when we have all the scripts that we want and concatenate them into one large dictionary.

In [4]:
# BETA
with open('BETA.pickle', 'rb') as read_file:
    MVP = pickle.load(read_file)

# # MVP
with open('BATCH1.pickle', 'rb') as read_file:
    BATCH1 = pickle.load(read_file)

# # CORE
with open('BATCH2.pickle', 'rb') as read_file:
    BATCH2 = pickle.load(read_file)
    
# # BATCH3
with open('BATCH3.pickle', 'rb') as read_file:
    BATCH3 = pickle.load(read_file)

# # ListB
# with open('ListB.pickle', 'rb') as read_file:
#     LISTB = pickle.load(read_file)

MVP.extend(BATCH1)
MVP.extend(BATCH2)
MVP.extend(BATCH3)

At this point, we have our dictionary of scripts. 
* Check the length to know the total number.
* The more scripts we have in the system, the better the end product will be.

In [5]:
print("We have " + str(len(MVP))+ " scripts. Good enough!")

We have 327 scripts. Good enough!


# ***END COMPILING!***