# Script Generator

📄,📄,📄 -> 📄

In this notebook, we'll compile individual, 10-minute voice acting scripts for our voice actors to read. We'll be selecting individual prompts from our Wikipedia, Edge Studio, and CMU Arctic sources and compiling them into our script template. Therefore before executing this notebook, it is important to populate `edge-studio/csv-source/` and `wikipedia/csv-source/` via the `Edge Studio Scraper.ipynb` and `Wikipedia Scraper.ipynb` notebooks.

A 10-minute script includes a total of around 1,500 words (we're targetting a 150 WPM reading pace).

To make things simple, we divide our 10 minute script into four 2.5 minute scripts which are auto-generated and appended together to form a full 10 minute script.

A 2.5 minute script should contain around 375 words and is structured as follows:

| 2.5 Minute Script | 
|------------|
|Edge Studio (131 words)|
|CMU Arctic (21 words)| 
|Wikipedia (223 words)|

They'll include a single Edge Studio prompt, 1-3 CMU prompts, and one or more Wikipedia prompts. An example 2.5 Minute Script might look like the following:

|Source|Title|Content|
|------|-----|-------|
|Edge Studio|Leadership Training|Becoming a leader in your organization isn't easy...|
|CMU |CMU |The boy ran quickly down the stairs.|
|CMU |CMU |He would remember this day.|
|Wikipedia |Business: Marketing |The marketing function of a business is responsible...|
|Wikipedia |Emotional intelligence: History |The study of emotional intelligence began...|


The reasoning behind how we've structured these scripts is described in detail [here].

[here]: https://docs.google.com/document/d/19PJYGxjlljE1ByFKGLA3XIgwO_X0k_B7pJp5VbuC8fg/edit

In [1]:
from IPython.display import display, Markdown
import pandas as pd
import random, os, re

### Define Source Directories and Target Script Length

We keep the full, original copies of source material in `csv-source/` folders.

It won't be uncommon for us to have multiple source files (for example, each Wikipedia article is saved as an individual `.csv`), so, for simplicity, the first thing we'll do is combine all source files into a single `.csv`. This makes it much easier to understand the entire body of source text which we are pulling from.

When we load up source data for the first time, they're imported from these directories into a `DataFrame`.

For our Edge Studio scripts, for example, we'll load up the source file(s) in our `edge-studio/csv-source/` folder, save a copy (`Edge Studio Source.csv`) into the `edge-studio/` directory to serve as a reference point for what our original source looked like. 

The file will be modified throughout the process of generating our script.

In [2]:
EDGE_STUDIO_SOURCE_NAME = "Edge Studio"
EDGE_STUDIO_SOURCE_DIRECTORY = "edge-studio/csv-source/"

CMU_SOURCE_NAME = "CMU"
CMU_SOURCE_DIRECTORY = "cmu-arctic/csv-source/"

WIKIPEDIA_SOURCE_NAME = "Wikipedia"
WIKIPEDIA_SOURCE_DIRECTORY = "wikipedia/csv-source/"

SCRIPT_EXPORT_DOC_DIRECTORY = "scripts/docx/"
SCRIPT_EXPORT_CSV_DIRECTORY = "scripts/csv/"

TARGET_SCRIPT_LENGTH = 375 # word count

### Define Helper Functions

In [3]:
def create_dataframe_from_source_directory(source_name, directory=""):
    """ Returns a DataFrame that contains the data in every .csv file in the specified directory. 
    
    Args:
        source_name (str): The name of the source. Ex. "Edge Studio".
        directory (str): A directory containing source data saved in 
                        our CSV format (['Index', 'Title', 'Content'])
    Returns:
        source_df (DataFrame): A DataFrame containing all rows from each .csv in directory.
    """
    # TODO: Avoid using constants within functions, abstract out into a global constant or keyword default
    source_df  = pd.DataFrame(columns=['Source', 'Title', 'Content'])
    
    for file in os.listdir(directory):
        file_path = os.path.join(directory, file)
        if os.path.isfile(file_path) and file_path.lower().endswith('.csv'):
            file_as_df = pd.read_csv(file_path, sep="\t", index_col=0)
            file_as_df['Source'] = source_name # add "Source" column
            source_df  = source_df.append(file_as_df, ignore_index=True, sort=False)
        else:
            print('Did not import %s' % file_path)
            
    return source_df


def create_master_dataframe_from_sources(sources_and_directories):
    """ Returns a DataFrame that contains the data in every .csv file in the specified directories. 
    
    Args:
        sources_and_directories (dict): Dictionary containing names of sources and
        corresponding directory. Ex. {"Wikipedia": "wikipedia/csv-source/"}.
    Returns:
        master_df (DataFrame): A DataFrame containing all rows from each .csv in directory.
    """
    # TODO: Avoid using constants within functions, abstract out into a global constant or keyword default
    master_df = pd.DataFrame(columns=['Source', 'Title', 'Content'])
    
    for key, value in sources_and_directories.items():
        new_df = create_dataframe_from_source_directory(source_name=key,
                                                        directory=value)
        master_df = master_df.append(new_df, ignore_index=True, sort=False)
    
    return master_df


def load_csv_as_dataframe(path):
    return pd.read_csv(path, sep="\t", index_col=0)


def count_words(s):
    list = re.findall("(\S+)", s) # Finds all non-whitespace patterns
    return len(list)

## 1. Load All Prompts Into Single Source

Below, we'll load all of our Edge Studio, CMU, and Wikipedia source files into a single, master `DataFrame` named `master_df`.

`master_df` serves a sort of central database containing our entire corpus of prompts across all sources.

In [4]:
# TODO: Use constants for source names
SOURCES_AND_DIRECTORIES = {"Edge Studio": EDGE_STUDIO_SOURCE_DIRECTORY,
                           "CMU": CMU_SOURCE_DIRECTORY,
                           "Wikipedia": WIKIPEDIA_SOURCE_DIRECTORY}

master_df = create_master_dataframe_from_sources(SOURCES_AND_DIRECTORIES)

display(master_df)

Unnamed: 0,Source,Title,Content
0,Edge Studio,2 Virtual Tour/E-Learning Scripts,Morton Arboretum\n\nWelcome to Morton Arboretu...
1,Edge Studio,2 Virtual Tours/Morton Arboretum,2 Virtual Tour/E-Learning Scripts\nMorton Arbo...
2,Edge Studio,A Haunting Title,"In this world, there is real evil.\nIn the dar..."
3,Edge Studio,A story,Every moment has a story. And every story matt...
4,Edge Studio,A Story,Every moment has a story. And every story matt...
5,Edge Studio,A&E Fashion Special,If you're in the market to liven up your décor...
6,Edge Studio,Abiogenic Theory,The abiogenic theory holds that hydrocarbons w...
7,Edge Studio,Adopted Children Seeking Biological Parents,Adopted Children Seeking Biological Parents\nI...
8,Edge Studio,Adopted Children Seeking Biological Parents,It is natural for an adopted child to be curio...
9,Edge Studio,Adopted Children Seeking Birth Parents,Adopted children are naturally curious about t...


Below, we can see that `master_df` contains all of the prompts found in our Edge Studio, CMU, and Wikipedia sources.

In [5]:
# edge_df = master_df.loc[master_df['Source'] == 'Edge Studio']
# display(edge_df.head(3))

# cmu_df = master_df.loc[master_df['Source'] == 'CMU'] 
# display(cmu_df.head(3))

# wiki_df = master_df.loc[master_df['Source'] == 'Wikipedia'] 
# display(wiki_df.head(3))

### Save Master `DataFrame` as  `.csv`

It will be helpful to store a single `.csv` that contains the prompts from our combined sources. This can serve as a source of truth later if we're curious where our prompts were extracted from.

We can also use this file as a starting point for future script generation if we only want to generate a small number of scripts now, update the file to remove the scripts we've used, and then reload and start another session later.

The `.csv` format also allows us to share and explore our master source outside of `pandas`.

In [6]:
# TODO: "Master Source" doesn't mean much. save_formatted_dataframe() might be simpler/more straightforward
# TODO: Create a util.py file that contains commonly-used functions like saving to csv. Then, import into notebooks.
def save_dataframe_as_master_source(df, directory="", filename="Master Source.csv"):
    df.to_csv(path_or_buf= "%s%s" % (directory, filename),
              sep="\t",
              index=True,
              index_label="Index")

In [7]:
save_dataframe_as_master_source(master_df)

## 2. Generate Single 2.5-Minute Script

First, we'll define a function that searches for prompts of a specific length.

This will be particularly useful as we search for Wikipedia prompts of varying lengths to ensure our full scripts are within a target word count.

In [8]:
def find_content_of_length(source_df, target_word_count, source=None, margin=.5):
    """ Returns the first row of a DataFrame whose 'Content' column is within a desired word count.
    
    Args:
        source_df (DataFrame): DataFrame containing prompts.
        target_word_count (int): Desired length of 'Content'.
        source (string): Name of source if we only want to search prompts Ex. "Edge Studio".
        margin (float): % range that sets the upper/lower boundaries of word count.
    Returns:
        row (Series): The first row in df that fits the criteria. 
    """    
    lower_b = round(target_word_count * (1-margin))
    upper_b = round(target_word_count * (1+margin))
    
    if source:
        source_df = source_df.loc[source_df['Source'] == source]
        
    for index, row in source_df.iterrows():
        if count_words(row['Content']) in range(lower_b, upper_b):
            return row
    
    # adjust search and look for 2 prompts of half our target length...
    lower_b = round(lower_b/2)
    upper_b = round(upper_b/2)
    
    # create a DataFrame to store multiple prompts
    results_df = pd.DataFrame(columns=['Source', 'Title', 'Content'])
    for index, row in source_df.iterrows():
        if count_words(row['Content']) in range(lower_b, upper_b):
            results_df = results_df.append(row, ignore_index=True, sort=False)            
        if len(results_df.index) == 2:
            return results_df
        
    raise Exception("Unable to find a %s prompt within %s and %s words..." % (source, lower_b, upper_b))

Next, we'll define a function that generates a 2.5-minute script. As a reminder, each 2.5-minute script should contain around 375 words and is structured as follows:

| 2.5 Minute Script | 
|------------|
|Edge Studio Prompt (131 words)|
|1-3 CMU Arctic Prompt(s) (21 words)| 
|Wikipedia Prompt(s) (223 words)|

Given a `DataFrame` containing all of our source material and a target word count for our 2.5-minute script, we execute the following steps to generate a 2.5-minute script:
1. The first available Edge Studio prompt is selected.
2. The first available 1-3 CMU prompts are selected.
3. The remaining word count for the script is calculated.
4. Wikipedia prompt(s) that fill the remaining word count are selected. 

In [9]:
def generate_script(source_df, word_count):
    """ Returns a DataFrame containing a 2.5-minute script containg prompts from a given 
        source DataFrame.
    
    Args:
        source_df (DataFrame): DataFrame containing Edge Studio, CMU, and Wikipedia prompts.
        word_count(int): Desired length (number of words) of resulting script.
    Returns:
        short_script_df (DataFrame): A DataFrame containing prompts for a 2.5-minute script. 
    """
    # TODO: Use constants for source names
    short_script_df = pd.DataFrame(columns=['Source', 'Title', 'Content'])
    
    # 1. Edge Studio
    edge_df = source_df.loc[source_df['Source'] == 'Edge Studio'] # get all Edge Studio prompts
    selected_edge_prompt = edge_df.iloc[0] # select single row
    short_script_df = short_script_df.append(selected_edge_prompt, ignore_index=True) # append
    
    # 2. CMU
    cmu_df = source_df.loc[source_df['Source'] == 'CMU']
    # TODO: Use constants for range values. Ex. CMU_ROW_MIN, CMU_ROW_MAX
    num_rows = random.randint(1,3)
    selected_cmu_prompts = cmu_df.iloc[0:num_rows]
    short_script_df = short_script_df.append(selected_cmu_prompts, ignore_index=True)
    
    # 3. Determine desired word count of Wikipedia prompt(s)
    word_count_so_far = 0
    for index, row in short_script_df.iterrows():
        words_in_prompt = count_words(row['Content'])
        word_count_so_far += words_in_prompt
    remaining_words = word_count - word_count_so_far
    
    # 4. Wikipedia
    if remaining_words > 10: # only search for Wiki prompts if we need them....
        selected_wiki_prompt = find_content_of_length(source_df, remaining_words, source="Wikipedia")
        short_script_df = short_script_df.append(selected_wiki_prompt, ignore_index=True)

    return short_script_df

In [10]:
single_script_df = generate_script(source_df=master_df,
                                   word_count=TARGET_SCRIPT_LENGTH)

In [11]:
display(Markdown("### Sample 2.5 Minute Script"))
display(single_script_df)

### Sample 2.5 Minute Script

Unnamed: 0,Source,Title,Content
0,Edge Studio,2 Virtual Tour/E-Learning Scripts,Morton Arboretum\n\nWelcome to Morton Arboretu...
1,CMU,CMU,"Author of the danger trail, Philip Steels, etc."
2,Wikipedia,Problem solving: Definition,The term problem solving is used in numerous d...


## 3. Generate Multiple Scripts

2.5-minute scripts are our building blocks for longer scripts. They're the smallest unit of script we'll build from.

To generate a 10-minute script, for example, we'll generate and append four 2.5-minute scripts.

To generate a 1-hour script, we'll generate and append a total of 24 (4 * 6) 2.5-minute scripts. And so on...

### Define Helper Function(s) To Help Generate Multiple Scripts

Whenever we select a prompt to use in a 2.5-minute script, it's important we remove it from our `source_df` so that it's not re-used in a later script.

Let's define a function that removes selected prompts from a given `source_df`.

**Bonus points:** if our source happens to have multiple instances of a prompt we'd like to remove, this function will remove all of them.

In [12]:
def remove_script_prompts_from_source(source_df, script_df):
    """ Returns a copy of source_df with the rows contained in script_df removed.
    
    Args:
        source_df (DataFrame): DataFrame containing all source prompts.
        script_df (DataFrame): DataFrame containing prompts that will be used in a script.
    Returns:
        source_df (DataFrame): A copy of source_df with any prompts found in script_df removed.
    """
    for index, row in script_df.iterrows():
        content = row['Content']
        row_to_remove = source_df.loc[source_df['Content'] == content]
        source_df = source_df.drop(row_to_remove.index)
    return source_df

### Generate Multiple Scripts

Finally, let's put all of this together and define a function that generates and returns a script containing *multiple* 2.5-minute scripts.

Our function will default to generating a 10-minute script (i.e. it will generate and append four 2.5-minute scripts).

In [13]:
def generate_scripts(source_df, num_of_short_scripts=4):
    """ Generates multiple short scripts, appends them, and returns the result as
        a single DataFrame.
    
    Args:
        source_df (DataFrame): DataFrame containing source prompts.
        num_of_short_scripts (int): Int specifying how many short (2.5-minute) scripts should be generated.
    Returns:
        long_script (DataFrame): DataFrame containing all requested short_scripts.
    """
    long_script = pd.DataFrame(columns=['Source', 'Title', 'Content'])
    
    for i in range(num_of_short_scripts):
        short_script = generate_script(source_df, TARGET_SCRIPT_LENGTH)
        long_script = long_script.append(short_script, ignore_index=True)
        source_df = remove_script_prompts_from_source(source_df, short_script)
        
    return long_script

In [14]:
long_script_df = generate_scripts(master_df, num_of_short_scripts=4)

display(Markdown("### Long Script"))
display(long_script_df)

### Long Script

Unnamed: 0,Source,Title,Content
0,Edge Studio,2 Virtual Tour/E-Learning Scripts,Morton Arboretum\n\nWelcome to Morton Arboretu...
1,CMU,CMU,"Author of the danger trail, Philip Steels, etc."
2,CMU,CMU,"Not at this particular case, Tom, apologized W..."
3,Wikipedia,Problem solving: Definition,The term problem solving is used in numerous d...
4,Edge Studio,2 Virtual Tours/Morton Arboretum,2 Virtual Tour/E-Learning Scripts\nMorton Arbo...
5,CMU,CMU,For the twentieth time that evening the two me...
6,CMU,CMU,"Lord, but I'm glad to see you again, Phil."
7,CMU,CMU,Will we ever forget it.
8,Wikipedia,Problem solving: Cognitive sciences,The early experimental work of the Gestaltists...
9,Edge Studio,A Haunting Title,"In this world, there is real evil.\nIn the dar..."


## 4. Export Script As Readable `.docx` File

We'll use the `python-docx` [library] to export as script as a readable `.docx` file for our voice actor.

It will also be helpful to save the script in a corresponding csv file in the (likely) event that we'll want to easily extract the prompts contained in the script later (or re-produce the docx file).

[library]: http://python-docx.readthedocs.io/en/latest/

In [15]:
from docx import Document
from docx.shared import Inches, RGBColor, Pt

In [16]:
def export_script_as_docx(script_df, doc_title, doc_path='Script.docx', csv_path="Script.csv"):
    
    document = Document()    
   
    # page title
    p = document.add_heading(level=1)
    r = p.add_run()
    r.add_picture('ws_watermark.png', width=Inches(1))
    r.add_break()
    r.add_text(doc_title)
    r.font.size = Pt(15)
    r.font.name = 'Helvetica Neue'
    r.font.bold = False
    r.font.color.rgb = RGBColor(142, 142, 142)
    r.add_break()
    
    # instructions
    p = document.add_paragraph()
    p.add_run('Style: ', 'Strong').font.name = 'Helvetica Neue'
    p.add_run('E-Learning').font.name = 'Helvetica Neue'
    p.add_run().add_break()
    p.add_run('Instructions: ', 'Strong').font.name = 'Helvetica Neue'
    p.add_run('Read black text, ignore green titles.').font.name = 'Helvetica Neue'
    
    # prompts
    for index, row in script_df.iterrows():
        # prompt title
        p = document.add_heading(level=1)
        r = p.add_run(row['Title'])
        r.font.size = Pt(14)
        r.font.name = 'Helvetica Neue'
        r.font.color.rgb = RGBColor(51, 238, 114)
        
        # prompt body
        p = document.add_paragraph()
        r = p.add_run(row['Content'])
        r.font.size = Pt(12)
        r.font.name = 'Helvetica Neue'
        r.font.color.rgb = RGBColor(0, 0, 0)
        
    try:
        document.save(doc_path) # write docx
        script_df.to_csv(path_or_buf=csv_path, sep="\t",index=True, index_label="Index") # write csv
    except Exception as e:
        print("Something went wrong while trying to export script!")
        print(e)

In [17]:
script_name = "Sample Script"

export_doc_path = "%s%s.docx" % (SCRIPT_EXPORT_DOC_DIRECTORY, script_name)
export_csv_filename = "%s%s.csv" % (SCRIPT_EXPORT_CSV_DIRECTORY, script_name)

export_script_as_docx(long_script_df, doc_title=script_name, doc_path=export_doc_path, csv_path=export_csv_filename)

## 5. Drop Prompts From Master Source

Now that we've selected some prompts to use in a voice acting script, we'll remove them from our source so that we avoid re-using them later.

1. Remove selected prompts from source `DataFrame`
2. Save new source `DataFrame` as our source `.csv`

In [18]:
master_df = remove_script_prompts_from_source(master_df, long_script_df)

save_dataframe_as_master_source(master_df)

# display(master_df)

## 6. Add Final Helper Functions...
If we need to generate a total of 20 hours of voice acting scripts, it'd be cumbersome to generate 120 (20\*6) 10-minute scripts one at a time. Soon, we'll write a loop that generates multiple, individual 10-minute scripts.

But, first, there's one final set of helper functions we'll want to write.

Some of the Wikipedia prompts in our source data can be too long to use in their entirety (300+ words), but we'd be able to use them if we split them up into smaller pieces.

We'll write the functionality to locate Wikipedia prompts in our source data, split them up by line break, and save each new, shorter prompt as a new row in our source data.

### Wiki Helper Functions

In [19]:
def split_string(s, sep="\n"):
    """Splits a string on a separator, removes any empty space, 
    and returns a list of resulting string(s)."""
    result = []
    for line in s.split(sep):
        if line.replace(" ", ""):
            result.append(line)
    return result

**Before:** `'Business is so awesome.\n\nMaking money is good.\nYay, business!'`

**After:** `['Business is so awesome.', 'Making money is good.', 'Yay, business!']`

In [20]:
def split_wiki_row(wiki_row, sep="\n"):
    """Splits each row in a DataFrame into multiple rows using a 
    separator on the value found in the 'Content' column."""
    
    df = pd.DataFrame(columns=['Source', 'Title', 'Content'])
    
    source = wiki_row['Source']
    title = wiki_row['Title']
    content = wiki_row['Content']
    
    content_split = split_string(content, sep=sep)
    len_of_split_content = len(content_split)
    
    if len_of_split_content == 1:
        return wiki_row
    else:
        i = 1
        for line in content_split:
            row_title = '%s (%s/%s)' % (title, i, len_of_split_content)
            formatted_row = {'Source': source, 'Title': row_title, 'Content': line}
            df = df.append(formatted_row, ignore_index=True, sort=False)
            i += 1
        return df

**Before:**

| Source | Title | Content | 
|--------|-------|---------|
|Wikipedia|Business: Summary|Business is so awesome.\n\nMaking money is good.\nYay, business!|

**After:** 

| Source | Title | Content | 
|--------|-------|---------|
|Wikipedia|Business: Summary (1/3)|Business is so awesome.|
|Wikipedia|Business: Summary (2/3)|Making money is good.|
|Wikipedia|Business: Summary (3/3)|Yay, business!|

In [21]:
def split_wiki_prompts(source_df, sep="\n"):
    """ Returns a copy of source_df with the Wikipedia prompts split up by line breaks (sep).
    
    Args:
        source_df (DataFrame): DataFrame containing all source prompts.
        sep (str): The value to split Wikipedia prompts by. Defaults to '\n.'
    Returns:
        source_df (DataFrame): A copy of source_df with Wikipedia prompts that have been broken
        up by the sep value.
    """
    
    # 1. extract edge, wiki, and cmu prompts
    edge_df = source_df.loc[source_df['Source'] == 'Edge Studio']
    wiki_df = source_df.loc[source_df['Source'] == 'Wikipedia']
    cmu_df = source_df.loc[source_df['Source'] == 'CMU']
    
    result_df = edge_df
    
    # 2. split up wiki prompts
    wiki_df_split = pd.DataFrame(columns=['Source', 'Title', 'Content'])
    for index, row in wiki_df.iterrows():
        # print("Splitting row: %s" % row)
        row_split = split_wiki_row(row)
        wiki_df_split = wiki_df_split.append(row_split, ignore_index=True, sort=False)

    # 3. append wiki
    result_df = result_df.append(wiki_df_split, ignore_index=True, sort=False)
    
    # 4. append CMU
    result_df = result_df.append(cmu_df, ignore_index=True, sort=False)
    
    return result_df

## Putting It All Together: Generating Multiple 10-Minute Scripts

And, finally, we'll write a loop that generates multiple, individual 10-minute scripts. The process is straightforward:

1. The source prompts are loaded
2. A 10-minute script is generated
3. The script is exported as a `.csv` and `.docx`
4. The prompts from the script are removed from the source prompts
5. The revised source prompts are saved (and will be used as the source in the next run)

We set the desired number of scripts to generate with the `TOTAL` variable.

After each script is generated, the prompts are removed from the source data and the new source data is saved as a `.csv`. This serves as a checkpoint to which we can later return if needed.

In the run below, we'll generate six 10-minute scripts.

In [22]:
STARTING_SCRIPT_NUM = 1
TOTAL = 6
VO_ACTOR = "Alicia"
CURRENT_SCRIPT_NUM = STARTING_SCRIPT_NUM

for i in range(0,TOTAL):

    # 1. load source prompts
    if CURRENT_SCRIPT_NUM == 1:
        # if this is the first-ever script for a data source, generate source data from source directories
        master_source_df = create_master_dataframe_from_sources(SOURCES_AND_DIRECTORIES)
        save_dataframe_as_master_source(master_source_df, 
                                        filename="Master Source - %s %s.csv" % (VO_ACTOR, CURRENT_SCRIPT_NUM - 1))
    else:
        # otherwise, load the data from the last checkpoint
        master_source_df = load_csv_as_dataframe('Master Source - %s %s.csv' % (VO_ACTOR, (CURRENT_SCRIPT_NUM -1)))

    # (optional) if we're unable to find wikipedia prompts, try splitting the source DF wiki rows
    # master_source_df = split_wiki_prompts(source_df=master_source_df)

    # 2. generate scripts
    scripts_df = generate_scripts(master_source_df, num_of_short_scripts=4)

    # 3. save scripts to docx, csv
    script_name = "Script %s - %s" % (CURRENT_SCRIPT_NUM, VO_ACTOR)
    export_doc_path = "%s%s.docx" % (SCRIPT_EXPORT_DOC_DIRECTORY, script_name)
    export_csv_filename = "%s%s.csv" % (SCRIPT_EXPORT_CSV_DIRECTORY, script_name)
    export_script_as_docx(scripts_df, 
                          doc_title="Script %s" % CURRENT_SCRIPT_NUM, 
                          doc_path=export_doc_path,
                          csv_path=export_csv_filename)

    # 4. remove prompts from master prompts source
    master_df = remove_script_prompts_from_source(master_source_df, scripts_df)

    # 5. save revised master source as csv
    save_dataframe_as_master_source(master_df, filename="Master Source - %s %s.csv" % (VO_ACTOR, CURRENT_SCRIPT_NUM))
    
    CURRENT_SCRIPT_NUM += 1

If we wanted to generate more scripts starting from where we left off, we'd change `STARTING_SCRIPT_NUM` to the number of the last script we generated (in the loop above, this would be `6`). This will load the data source at the last checkpoint and start generating scripts from there.

The generated scripts can be found in the `/scripts` directory.


`TODO (Improvements)`:
- Wrap the loop above into a single function.
- Automate the split_wiki_prompts() function.
- Create a less intensive checkpointing process. Instead of saving a new `.csv` of the entire source data after each script is generated, save a single copy at the beginning of the run and then write a function that takes all of the individual script `.csv` files generated during a run and remove their prompts from the original master source `.csv`.