# Exploring Campaign 1

Let's take a look at campaign 1 of Critical Role: Vox Machina! This was the inaugural campaign, picking up in the middle of a home game and coming to actual play for the first time. We might expect to see some major changes as the campaign goes along. Some things to look at:

- Do the different actors have different profiles in terms of how much they speak and how long their dialogue is?
- Do these change over time - do they become more or less active or verbose?

## Loading the Data

First, we need to load all the data we want to look at, which is all episodes of campaign 1. We have a database that includes information for each episode, as well as pre-processed transcripts for each episode.

In [1]:
import pandas as pd

In [29]:
data_dir = '../../data'
db_file = f'{data_dir}/transcript_database.csv'

db = pd.read_csv(
    db_file,
    parse_dates = ['download_date']
)
drop_cols = ['section', 'section_no', 'link', 'download_date']
campaign1_db = db.loc[db['section_no'] == 1].drop(drop_cols, axis = 1).reset_index()
campaign1_db

Unnamed: 0,index,subsection_no,episode_no,subsection,episode,transcript_file
0,0,1,1,Arc 1: Kraghammer and Vasselheim,Arrival_at_Kraghammer,section001/subsection001/episode001.csv
1,1,1,2,Arc 1: Kraghammer and Vasselheim,Into_the_Greyspine_Mines,section001/subsection001/episode002.csv
2,2,1,3,Arc 1: Kraghammer and Vasselheim,Strange_Bedfellows,section001/subsection001/episode003.csv
3,3,1,4,Arc 1: Kraghammer and Vasselheim,Attack_on_the_Duergar_Warcamp,section001/subsection001/episode004.csv
4,4,1,5,Arc 1: Kraghammer and Vasselheim,The_Trick_about_Falling,section001/subsection001/episode005.csv
...,...,...,...,...,...,...
110,110,5,12,Arc 5: Vecna,Shadows_of_Thomara,section001/subsection005/episode012.csv
111,111,5,13,Arc 5: Vecna,Dark_Dealings,section001/subsection005/episode013.csv
112,112,5,14,Arc 5: Vecna,The_Final_Ascent,section001/subsection005/episode014.csv
113,113,5,15,Arc 5: Vecna,"Vecna,_the_Ascended",section001/subsection005/episode015.csv


There are 115 episodes in campaign 1, split into 5 arcs.

In [20]:
print(campaign1_db['subsection'].value_counts())

subsection
Arc 3: The Chroma Conclave          46
Arc 1: Kraghammer and Vasselheim    23
Arc 5: Vecna                        16
Arc 2: The Briarwoods               15
Arc 4: Taryon Darrington            15
Name: count, dtype: int64


Now, let's get all the transcripts into a single giant `DataFrame`.

In [28]:
def GetTranscript(arc_no, ep_no, transcript_file):
    df = pd.read_csv(f'{data_dir}/{transcript_file}')
    df.insert(0, 'arc_no', arc_no)
    df.insert(1, 'episode_no', ep_no)
    return df

transcripts = pd.DataFrame()
for index, row in campaign1_db.iterrows():
    transcripts = pd.concat([
        transcripts,
        GetTranscript(row['subsection_no'], row['episode_no'], row['transcript_file'])        
    ])
transcripts.reset_index(inplace = True)
transcripts

Unnamed: 0,index,arc_no,episode_no,section_no,line_no,section,speaker,line
0,0,1,1,1,1,Pre-Show,MATT,"Hello everyone. My name is Matthew Mercer, voi..."
1,1,1,1,1,2,Pre-Show,TRAVIS,"Right, listen up! If you have ale, then you ha..."
2,2,1,1,1,3,Pre-Show,NOSPEAKER,[record scratch] Wait.\n
3,3,1,1,1,4,Pre-Show,TRAVIS (CONT'D),"Easily the brains of the group, Grog is often ..."
4,4,1,1,1,5,Pre-Show,MARISHA,A first impression of Keyleth would leave you ...
...,...,...,...,...,...,...,...,...
291615,2271,5,16,4,1236,Part II,MARISHA,That was beautiful.
291616,2272,5,16,4,1237,Part II,SAM,"Thanks, Matt. That was really nice."
291617,2273,5,16,4,1238,Part II,MATT,"Love you, guys."
291618,2274,5,16,4,1239,Part II,LAURA,"Love you, Matthew."


We have nearly 300,000 rows, so a good dataset to work with.