# Exploring and Visualizing Dialogue in the Fallout 4 Universe
This notebook includes the creation of `.csv` files that have dialogue information for Fallout 4 characters. From these files, a character's response text and script notes are used to visualize their most common words and sentiments in the Fallout 4 universe. 

In [1]:
from generate_dialogue_table import create_table
import pandas as pd

characters = pd.read_csv('character_names.csv')
characters.head()

Unnamed: 0,Character,Location
0,Sole Survivor,"Sanctuary Hills, Vault 111"
1,Blake Abernathy,Abernathy farm
2,Connie Abernathy,Abernathy farm
3,Lucy Abernathy,Abernathy farm
4,Chancer,Andrew station


There's a subfolder that will hold all the `.csv` files of each character's dialogue. The `os` module will allow the `to_csv()` method to conveniently save to this subfolder.

In [2]:
import os
dir_path = os.getcwd()
subfolder = dir_path + '/storage/'

Iterate through each character in the list and run the `create_table` function from `generate_dialogue_table.py`, then stow the created `.csv` in the subfolder.

It's unlikely that a table will successfully be made for each character, but it doesn't hurt to iterate through each one regardless.

In [3]:
from tqdm.notebook import tqdm

for character in tqdm(characters['Character']):
    try:
        directory = subfolder + character.lower().replace(' ', '_') + '.csv'
        entry = create_table(character)
        temp = pd.DataFrame(entry)
        temp.to_csv(directory, index=False)
    except:
        pass

  0%|          | 0/483 [00:00<?, ?it/s]

Captain Zao has been arbitrarily selected for exploration. Once there has been a decent foundation for exploring the dialogue of a single character, the whole process may be abstracted to any and/or all characters.

In [4]:
zao = pd.read_csv(subfolder + 'captain_zao.csv')
zao.head()

Unnamed: 0,CATEGORY,TYPE,SUBTYPE,PROMPT,DIALOGUE BEFORE,RESPONSE TEXT,DIALOGUE AFTER,SCRIPT NOTES,SCENE
0,Scene,SceneDialogue,Custom,,,No! You fool! I have not prepared the reactor ...,,Afraid,MS02WarheadFirst
1,Scene,SceneDialogue,Custom,,,"No! You fool! Without dampening coil, the reac...",,Afraid,MS02WarheadFirst
2,Scene,SceneDialogue,Custom,,NPCMCaptainZao: Was not sure my beautiful Yang...,Here. Payment Zao promised you.,,Happy,MS02SubReady
3,Scene,SceneDialogue,Custom,,,"Plus, to show my thanks. I give you these, ah....",Captain: Each one acts as a homing signal for ...,struggles a bit recalling the English word / N...,MS02SubReady
4,Scene,SceneDialogue,Custom,,NPCMCaptainZao: Was not sure my beautiful Yang...,You are a capitalist. I know you expect paymen...,,Amused,MS02SubReady


In [5]:
for i in range(4, 8):
    print(zao.columns[i], ': ', zao[zao.columns[i]].nunique())

DIALOGUE BEFORE :  47
RESPONSE TEXT :  126
DIALOGUE AFTER :  20
SCRIPT NOTES :  70


For `spaCy` to work with the dialogue data, all of the entries must be concatenated together. Additionally, the entries must be sent to lowercase so that capitalization doesn't affect counting.

In [6]:
response_doc = ' '.join([str(response).lower() for response in zao['RESPONSE TEXT']])
response_doc[:int(len(response_doc) / 6)]

'no! you fool! i have not prepared the reactor yet! no! you fool! without dampening coil, the reactor will explode! baozha! here. payment zao promised you. plus, to show my thanks. i give you these, ah... transponders. you are a capitalist. i know you expect payment for services. instead i would offer you service in exchange for your service. it is our way. take these, ah... transponders. many thanks. i look forward to seeing my zhongguo soon. the journey is long. sadly, that is possible. it is fortunate zao is man of honor. a lesser man insulted would not keep his promise. i understand. there is no love for zao. or zhongguo. i have no wish to bring weapons back home. i leave the fire here. and... i owe you. zao pays his debts. besides, my missiles just might... *chuckle*  "save your bacon." soon, i will set sail. you should get ashore. farewell, american. i must admit. i have met...  worse capitalists. each one acts as a homing signal for missile i can fire from my yangtze. only small

The `spaCy` library can be used alongside `Counter` to get the most-frequent words from Captain Zao. Take the string above and tokenize it, excluding stopwords and punctuation.

In [7]:
import spacy
from collections import Counter

nlp = spacy.load('en_core_web_sm')
doc = nlp(response_doc)

words = [token.text
         for token in doc
         if not token.is_stop and not token.is_punct]

verbs = [token.text
         for token in doc
         if (not token.is_stop and
             not token.is_punct and
             token.pos_ == 'VERB')]

Return the ten most common words.

In [8]:
word_freq = Counter(words)
print(word_freq.most_common(10))

[('yangtze', 19), ('zao', 13), ('dampening', 12), ('coil', 12), ('reactor', 9), ('ah', 8), ('need', 8), ('like', 7), ('warhead', 7), ('years', 7)]


Make a DataFrame of all the common words sorted from most frequent to least frequent.

In [9]:
word_freq_df = pd.DataFrame(word_freq.most_common(), columns=['Word', 'Count'])
word_freq_df.head()

Unnamed: 0,Word,Count
0,yangtze,19
1,zao,13
2,dampening,12
3,coil,12
4,reactor,9


The top ten words in a graph.

In [10]:
import plotly
import plotly.express as px

pd.options.plotting.backend = 'plotly'
plotly.io.templates.default = 'plotly_dark'

fig = px.bar(word_freq_df[:10], x='Word', y='Count')
fig.show()

Make a list of each sentiment associated with a sample of dialogue. `NaN` values are replaced with 'Neutral' to indicate no significant emotion with the associated dialogue.

In [11]:
sentiment = [str(entry).split('/')[-1].strip().lower() 
             for entry in zao['SCRIPT NOTES'].fillna('Neutral')]
sentiment[:5]

['afraid', 'afraid', 'happy', 'nervous', 'amused']

In [12]:
Counter(sentiment).most_common(10)

[('neutral', 14),
 ('confident', 13),
 ('irritated', 10),
 ('amused', 9),
 ('happy', 8),
 ('apologetic', 7),
 ('sad', 6),
 ('nervous', 5),
 ('grateful', 5),
 ('stern', 5)]

In [13]:
sentiment_df = pd.DataFrame(Counter(sentiment).most_common(), columns=['Word', 'Count'])

fig = px.bar(sentiment_df[:10], x='Word', y='Count')
fig.show()

Get all of the file names in the `storage` subfolder in order to iterate over every available character.

In [14]:
csv_files = os.listdir(subfolder)
csv_files[:5]

['wilhelmina_cabot.csv',
 'andre_michaud.csv',
 'cricket.csv',
 'alan_binet.csv',
 'supervisor_white.csv']

Check how many characters were able to have their dialogue recorded.

In [15]:
print(
    '''
    There are {samples} characters who have recorded dialogue.
    This is {percentage: .2f}% of all recorded characters.
    '''.format(samples=len(csv_files), percentage=(100*len(csv_files)/len(characters)))
)


    There are 178 characters who have recorded dialogue.
    This is  36.85% of all recorded characters.
    


It will prove convenient later on to recover the name of a given character from their associated `.csv` file.

In [16]:
for file in csv_files[:5]:
    print(' '.join([str(part).capitalize() for part in file.split('_')]).replace('.csv', ''))

Wilhelmina Cabot
Andre Michaud
Cricket
Alan Binet
Supervisor White


A `map` with `lambda` also works. In general, such a combination can be replaced with a list comprehension. It boils down to preference and readability.

In [17]:
' '.join(map(lambda x: str(x).capitalize(), csv_files[1].split('_'))).replace('.csv', '')

'Andre Michaud'

Dump the previous few cells into a function, then iterate over each file in the subfolder to get the relevant information.

In [18]:
def get_dialogue_info(character_csv: str) -> tuple:
    '''
    Given a file name from the "storage" subfolder, return the character's
    name, their responses, as well as a Pandas DataFrame for the most common 
    words and most common sentiments.

                Parameters:
                    character_csv (str):
                    The name of the file found in the "storage" subfolder.

                Returns:
                    character_name, csv_word_freq, csv_sentiment_freq (tuple):
                    The character's name, their responses, their word frequency 
                    and count, and their sentiment frequency and count.

    '''
    character_name = ' '.join(map(lambda x: str(x).capitalize(), character_csv.split('_'))).replace('.csv', '')
    character_csv = pd.read_csv(subfolder + character_csv)
    csv_doc = nlp(' '.join(map(lambda x: str(x).lower(), character_csv['RESPONSE TEXT'])))
    
    csv_words = [token.text
                 for token in csv_doc
                 if not token.is_stop and not token.is_punct]

    csv_word_freq = pd.DataFrame(Counter(csv_words).most_common(), columns=['Word', 'Count'])

    csv_sentiment = [str(entry).split('/')[-1].strip() 
                     for entry in character_csv['SCRIPT NOTES'].fillna('Neutral')]
    csv_sentiment_freq = pd.DataFrame(Counter(csv_sentiment).most_common(), columns=['Sentiment', 'Count'])
    # Also make a sentiment doc, this will come in handy later on
    sentiment_doc = nlp(' '.join(map(lambda x: str(x).lower(), csv_sentiment)))

    return character_name, csv_doc, csv_word_freq, csv_sentiment_freq, sentiment_doc

A quick preview of the function before iteration begins. The second  and last entry will be a bit long, but that's expected since it is a concatenation of *all* responses from a character and the character's sentiments, respectively.

In [19]:
get_dialogue_info(csv_files[1])

('Andre Michaud',
 i owe my very life to the stranger. we should listen now. he says this is your doing? but... i do not know you. *dry chuckle* you're obviously not from here. gratitude... it's a hard thing out here. so, yeah. of course. they did not find much on me, but you're welcome to it. well, i might beg to differ about that. andre michaud. with my husband john.... we hunt mirelurks, for their meat, their shells. but... the last hunt did not go well. as bad as it can get. a mirelurk hunter, he blindsided john. i managed to pull him out, but i had to run blindly. into the fog. john was the tracker, he was the one who knows this island. but me, i got turned around. i don't remember what happened. someone, i don't even know who, found me... brought me here. this island... it... it has taken my husband. it has taken everything. this island is a sickness. don't you see, it's evil? as soon as i am well, i am going to grab as many guns as i can carry. and i will slaughter everything in

In [20]:
from tqdm.notebook import tqdm

dialogue_superlist = []
exceptions = []

for file in tqdm(csv_files):
    try:
        dialogue_superlist.append(get_dialogue_info(file))
    except:
        exceptions.append(file)

dialogue_dataframe = pd.DataFrame(dialogue_superlist, columns=['Character', 'Dialogue', 'Word Frequency DataFrame', 'Sentiment Frequency DataFrame', 'Sentiment'])
dialogue_dataframe.head()

  0%|          | 0/178 [00:00<?, ?it/s]

Unnamed: 0,Character,Dialogue,Word Frequency DataFrame,Sentiment Frequency DataFrame,Sentiment
0,Wilhelmina Cabot,"(jack, !, are, you, sure, it, 's, wise, ,, tel...",Word Count 0 jack 22 1 ...,Sentiment Count 0 Concerned 20 1 ...,"(nervous, nervous, relieved, grateful, disgust..."
1,Andre Michaud,"(i, owe, my, very, life, to, the, stranger, .,...",Word Count 0 john 5 1 isl...,Sentiment Count 0 Neutral 15 1 ...,"(neutral, puzzled, sad, grateful, sad, tired, ..."
2,Cricket,"(what, this, place, just, has, farmers, ?, shi...",Word Count 0 guns 8 1 ...,Se...,"(neutral, giddy, sell, it, for, a, laugh, gidd..."
3,Alan Binet,"(i, 'd, just, like, to, apologize, for, any, ....",Word Count 0 synths 13 1 ...,Se...,"(genuinely, apologetic, ,, abashed, sincerely,..."
4,Supervisor White,"(of, course, ,, darling, ., our, unique, hiera...",Word Count 0 darling 18 1 ...,Sen...,"(neutral, neutral, neutral, neutral, neutral, ..."


Check what files, if any, threw exceptions.

In [21]:
exceptions

['nate.csv']

In [22]:
nate = pd.read_csv(subfolder + 'nate.csv')
nate.head()

Unnamed: 0,Topic,PlayerDialogue,Custom,Unnamed: 3,Unnamed: 4,Lemme just slide in here...,Unnamed: 6,"playful banter, moving your spouse out of the way to use the mirror",Unnamed: 8
0,Topic,PlayerDialogue,Custom,,,Just another minute. I swear...,,"playful banter, moving your spouse out of the ...",
1,Topic,PlayerDialogue,Custom,,,Such a cute nose.,,"playful banter, lovingly married for years",
2,Topic,PlayerDialogue,Custom,,,I think Shaun has your nose.,,"playful banter, lovingly married for years",
3,Topic,PlayerDialogue,Custom,,,As beautiful as the day we met...,,"playful banter, lovingly married for years",
4,Topic,PlayerDialogue,Custom,,,Lookin' good.,,"playful banter, lovingly married for years",


Nate is the male playable character. Perhaps the status of playable character yielded additional information on his Wiki page that led to erroneous scraping. A simple fix is to sacrifice the sample of data that constitutes the columns and borrow the columns from Captain Zao's DataFrame.

In [23]:
nate.columns = zao.columns.copy()
nate.head()

Unnamed: 0,CATEGORY,TYPE,SUBTYPE,PROMPT,DIALOGUE BEFORE,RESPONSE TEXT,DIALOGUE AFTER,SCRIPT NOTES,SCENE
0,Topic,PlayerDialogue,Custom,,,Just another minute. I swear...,,"playful banter, moving your spouse out of the ...",
1,Topic,PlayerDialogue,Custom,,,Such a cute nose.,,"playful banter, lovingly married for years",
2,Topic,PlayerDialogue,Custom,,,I think Shaun has your nose.,,"playful banter, lovingly married for years",
3,Topic,PlayerDialogue,Custom,,,As beautiful as the day we met...,,"playful banter, lovingly married for years",
4,Topic,PlayerDialogue,Custom,,,Lookin' good.,,"playful banter, lovingly married for years",


It doesn't appear that this specific set of `'SCRIPT NOTES'` will be entirely helpful. For this reason, Nate gets omitted in this notebook (of course, all the necessary code is already within this notebook for those who are interested in exploring Nate's dialogue). 

For now, avoid concatenating all of the dialogue in `dialogue_dataframe` together lest its size becomes too big for `spaCy` to handle. Fortunately, all of the dialogue can easily fit inside a pandas Series or DataFrame for `value_counts()` to do its magic on with one caveat to be revealed later.

In [24]:
all_words = []

for words in dialogue_dataframe['Word Frequency DataFrame']:
    for word in words['Word']:
        all_words.append(word)

In [25]:
pd.Series(all_words).value_counts()[:10]

like     163
know     160
time     154
right    152
want     151
think    150
good     148
let      145
got      142
need     141
dtype: int64

Admittedly, these are pretty boring words. Perhaps the results will be more interesting when filtering out to only nouns and/or verbs. Before doing that, though, check the most common sentiments in the Fallout 4 universe.

In [26]:
all_sentiments = []

for sentiments in dialogue_dataframe['Sentiment Frequency DataFrame']:
    for sentiment in sentiments['Sentiment']:
        all_sentiments.append(sentiment)

pd.Series(all_sentiments).value_counts()[:10]

Neutral       169
Irritated     116
Friendly      106
Concerned      94
Surprised      93
Angry          90
Puzzled        86
Happy          85
Apologetic     85
Stern          84
dtype: int64

Roll every entry of dialogue into a single list, and then join the elements of that list together into one string for `spaCy`.

In [27]:
# This is different from dialogue_superlist: dialogue_superlist shares the same columns
# as dialogue_dataframe. dialogue_list is a temporary holder.
dialogue_list = []

for entry in dialogue_dataframe['Dialogue']:
    dialogue_list.append(entry)

dialogue_doc = ' '.join(map(str, dialogue_list))

`spaCy` can handle documents with up to 1,000,000 characters. Of course, it was mentioned earlier than this would be too small for `dialogue_doc`:

In [28]:
print('dialogue_doc has a total of {characters: ,} characters.'.format(characters=len(dialogue_doc)))

dialogue_doc has a total of  2,712,428 characters.


One workaround would be to split the data into manageable chunks. Another, "livin'-like-Larry" approach would be to bravely increase the value of `nlp.max_length`.

In [29]:
nlp.max_length = 2 * len(dialogue_doc) + 1

Now words can be easily filtered by noun, verb, or whatever else sort of filter. 

In [30]:
final_doc = nlp(dialogue_doc)

final_words = [token.text
               for token in final_doc
               if not token.is_stop and not token.is_punct]

final_verbs = [token.text
               for token in final_doc
               if (not token.is_stop and
                   not token.is_punct and
                   token.pos_ == 'VERB')]

final_nouns = [token.text
               for token in final_doc
               if (not token.is_stop and
                   not token.is_punct and
                   token.pos_ == 'NOUN')]

final_adjs = [token.text
              for token in final_doc
              if (not token.is_stop and
                  not token.is_punct and
                  token.pos_ == 'ADJ')]

How do the most common words compare against the DataFrame of most common words?

In [31]:
Counter(final_words).most_common(10)

[('know', 2488),
 ('like', 2460),
 ('got', 2045),
 ('good', 2041),
 ('right', 1741),
 ('need', 1714),
 ('time', 1600),
 ('think', 1485),
 ('let', 1381),
 ('going', 1266)]

The most common words are generally the same; however, the counts are off by an order of magnitude or two. This is not really a big deal as the words in `all_words` were scraped per character and not their associated counts per word per character: the counts present in `all_words` represents how many characters said that word at least once. For this reason, the approach using `Counter` and `most_common` is the superior and more accurate method.

The approach can also be repeated for the most common sentiments.

In [32]:
sentiment_list = []

for entry in dialogue_dataframe['Sentiment']:
    sentiment_list.append(entry)

sentiment_doc = ' '.join(map(str, sentiment_list))

final_sentiment = [token.text
                   for token in nlp(sentiment_doc)
                   if not token.is_stop and not token.is_punct]

Counter(final_sentiment).most_common(10)

[('neutral', 17524),
 ('friendly', 1929),
 ('irritated', 1879),
 ('concerned', 1572),
 ('angry', 1245),
 ('confident', 1231),
 ('player', 1206),
 ('amused', 1194),
 ('happy', 1109),
 ('stern', 1033)]

Obviously, "player" shouldn't count as a sentiment. A quick update can fix that.

Keep in mind that most of the sentiments come from "SCRIPT NOTES" and every entry will not necessarily be an emotion, but may still be insightful to the associated dialogue.

In [33]:
final_sentiment = [sentiment for sentiment in final_sentiment if sentiment != 'player']
Counter(final_sentiment).most_common(10)

[('neutral', 17524),
 ('friendly', 1929),
 ('irritated', 1879),
 ('concerned', 1572),
 ('angry', 1245),
 ('confident', 1231),
 ('amused', 1194),
 ('happy', 1109),
 ('stern', 1033),
 ('voice', 811)]

A graph can be made for the most common words and types of words.

In [34]:
import plotly.graph_objects as go

fig = go.Figure()

final_cols = [final_words, final_verbs, final_nouns, final_adjs]
final_names = ['All', 'Verb', 'Noun', 'Adjective']

for column, column_name in zip(final_cols, final_names):
    temp = pd.DataFrame(Counter(column).most_common(10))
    fig.add_trace(
        go.Bar(
            x=pd.DataFrame(temp)[0],
            y=pd.DataFrame(temp)[1],
            name=column_name
        )
    )

fig.update_layout(
    title={'text': 'Most Common Words in Fallout 4',
           'xanchor': 'center',
           'x': 0.5},
    xaxis_title='Word',
    yaxis_title='Count'
)

fig.show()

The above graph isn't the most beautiful visualization, and can be a bit hard to digest. Perhaps a graph with a dropdown menu will be better, although more effort will be required.

In [35]:
from plotly.subplots import make_subplots

fig = go.Figure()

for column, column_name, color in zip(final_cols, final_names, ['', 'green', 'yellow', 'red']):
        if column_name == 'All':
            temp = pd.DataFrame(Counter(column).most_common(10))
            fig.add_trace(
                go.Bar(
                    x=pd.DataFrame(temp)[0],
                    y=pd.DataFrame(temp)[1],
                    visible=True,
                )
            )
        else:
            temp = pd.DataFrame(Counter(column).most_common(10))
            fig.add_trace(
                go.Bar(
                    x=pd.DataFrame(temp)[0],
                    y=pd.DataFrame(temp)[1],
                    visible=False,
                    marker_color=color
                )
            )

# Dropdown
fig.update_layout(
    updatemenus=[
        dict(
            active=0,
            buttons=list([
                dict(label='All',
                     method='update',
                     args=[{'visible': [True, False, False, False]}
                            ]),
                dict(label='Verbs',
                     method='update',
                     args=[{'visible': [False, True, False, False]},
                            ]),
                dict(label='Nouns',
                     method='update',
                     args=[{'visible': [False, False, True, False]},
                            ]),
                dict(label='Adjectives',
                     method='update',
                     args=[{'visible': [False, False, False, True]},
                            ]),
            ])
        )
    ]
)

fig.show()

Keep in mind that this graph is for *all* of the dialogue across the Fallout 4 universe, at least for the characters whose dialogue were available and were successfully gathered. A subjectively better version of this graph could depend on the character selected and the most common sentiments as a subplot.

The addition of the subplot isn't too bad, just assign the row and column.

In [36]:
from plotly.subplots import make_subplots

fig = make_subplots(rows=2, cols=1, subplot_titles=['Common Words', 'Sentiment'])

for column, column_name, color in zip(final_cols, final_names, ['', 'green', 'yellow', 'red']):
        if column_name == 'All':
            temp = pd.DataFrame(Counter(column).most_common(10))
            fig.add_trace(
                go.Bar(
                    x=pd.DataFrame(temp)[0],
                    y=pd.DataFrame(temp)[1],
                    visible=True,
                ),
                row=1,
                col=1
            )
        else:
            temp = pd.DataFrame(Counter(column).most_common(10))
            fig.add_trace(
                go.Bar(
                    x=pd.DataFrame(temp)[0],
                    y=pd.DataFrame(temp)[1],
                    visible=False,
                    marker_color=color
                ),
                row=1,
                col=1
            )
fig.add_trace(
    go.Bar(
        x=pd.DataFrame(Counter(final_sentiment).most_common(10))[0],
        y=pd.DataFrame(Counter(final_sentiment).most_common(10))[1]
    ),
    row=2,
    col=1
)

# Hide the legend since the dropdown menu will show what's selected anyway
fig.update(layout_showlegend=False)

# Dropdown
fig.update_layout(
    updatemenus=[
        dict(
            active=0,
            buttons=list([
                dict(label='All',
                     method='update',
                     args=[{'visible': [True, False, False, False, True]}
                            ]),
                dict(label='Verbs',
                     method='update',
                     args=[{'visible': [False, True, False, False, True]},
                            ]),
                dict(label='Nouns',
                     method='update',
                     args=[{'visible': [False, False, True, False, True]},
                            ]),
                dict(label='Adjectives',
                     method='update',
                     args=[{'visible': [False, False, False, True, True]},
                            ]),
            ])
        )
    ]
)

fig.show()

### Words Commonly Associated With Fallout
Since the most common words aren't necessarily specific to the Fallout universe or all that interesting, it could be a little interesting to count the occurrences of words that are commonly associated with Fallout. This could be (but not necessarily limited to): raider, bottlecaps or caps, radroach, vaults, and deathclaws.

In [37]:
print('Word \t\t Count')

fallout_words = ['raider', 'radroach', 'caps', 'vault', 'mutant', 'deathclaw', 'synth']

for word in fallout_words:
    print(word, ':  \t', final_words.count(word))

Word 		 Count
raider :  	 75
radroach :  	 7
caps :  	 388
vault :  	 228
mutant :  	 57
deathclaw :  	 63
synth :  	 359


In [None]:
fig = go.Figure()

# This is to help with sorting the words and reducing clutter inside go.Bar
fallout_words_df = pd.DataFrame(fallout_words, columns=['Word'])
fallout_words_df['Count'] = [final_words.count(word) for word in fallout_words]
fallout_words_df = fallout_words_df.sort_values('Count', ascending=False)

fig.add_trace(
    go.Bar(
        x=fallout_words_df['Word'],
        y=fallout_words_df['Count'],
        text=fallout_words_df['Count']
    )
)

fig.update_traces(textposition='outside')
fig.update_layout(title={'text': 'Words Commonly Associated with Fallout 4',
                         'xanchor': 'center',
                         'x': 0.5})

fig.show()

### Conclusion

This notebook explored and visualized dialogue in the Fallout 4 universe for characters who had dialogue available to scrape. Within that subset, the `Counter` module was used to find the most common occurrences for words and sentiments for all available characters in Fallout 4. 

Apparently, words commonly associated with Fallout (such as in the few cells above) were far less frequent than imagined, with most of the common words also being common in real, everyday language. Perhaps it is good that the language of the Fallout 4 universe is relatively indicative of everyday language lest that the dialogue be immersion-breaking. An overwhelming majority of the dialogue is neutral, with the next two most common sentiments being "friendly" and "irritated", respectively. This is also not unlike everyday life, to some degree.

Word frequency and sentiment frequency may serve as principle components to a model that learns on character dialogue data. The Fallout 4 data presented here may be presented in a future Jupyter notebook alongside a TensorFlow model exploring that idea.