##Analyzing the Anatomy of "Bob's Burgers" Episodes

As an avid "Bob's Burgers" fan, I was intrigued to stumble upon a series of data visualizations that provide a unique look under the hood of this beloved animated sitcom which was a challenge for Tidy Tuesdays. Using analytical metrics like sentiment variance, dialogue density, and exclamation ratios, we can glean fascinating insights into how the writing, pacing, and overall character of the show evolves across seasons and individual episodes.

The radar charts presented in these visualizations offer a comparative analysis of key episode attributes. The first chart gives us a high-level overview, showcasing how metrics like average episode length and unique word count fluctuate. But it's the second chart, with its color-coded season and episode designations, that really allows us to dive deeper.

For instance, we can see that Season 1, Episode 1 stands out with its high sentiment variance - suggesting a broader emotional range compared to other installments. Meanwhile, Season 2, Episode 2 demonstrates an unusually high dialogue density, hinting at a more rapid-fire, quippy style of humor.

Further, the question ratio and exclamation ratio metrics reveal differences in how the characters communicate. Some episodes prioritize rhetorical questions to drive the comedy, while others leverage more emphatic exclamations.

Analyzing these visualizations, it becomes clear that the creative team behind "Bob's Burgers" is extraordinarily meticulous in crafting each episode. The data suggests they carefully engineer the pacing, tone, and linguistic choices to deliver the perfect blend of heart and humor that keeps fans coming back season after season.


In [None]:
# Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from plotly.subplots import make_subplots
from sklearn.preprocessing import StandardScaler
import re
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
import plotly.graph_objects as go

In [None]:
# Load data from URL
url = 'https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2024/2024-11-19/episode_metrics.csv'
columns = ["season", "episode", "dialogue_density", "avg_length", "sentiment_variance",
           "unique_words", "question_ratio", "exclamation_ratio"]
bobs_burgers_data = pd.read_csv(url, names=columns)

In [None]:
# Drop the first row as it is a header row
bobs_burgers_data.drop(axis=0, index=0, inplace=True)

In [None]:
# Display the first few rows after dropping
bobs_burgers_data.head()

Unnamed: 0,season,episode,dialogue_density,avg_length,sentiment_variance,unique_words,question_ratio,exclamation_ratio
1,1,1,0.9300411522633744,37.53097345132744,3.3244153244153245,960,0.2079646017699115,0.1924778761061946
2,1,2,0.993975903614458,33.81414141414142,3.986798964624677,950,0.1838383838383838,0.1191919191919192
3,1,3,0.992,31.06653225806452,4.075439075439076,915,0.1754032258064516,0.0987903225806451
4,1,4,0.9943714821763602,32.24528301886792,3.708638374699919,892,0.1415094339622641,0.1113207547169811
5,1,5,0.9935622317596566,34.062634989200866,3.778036295951605,888,0.1641468682505399,0.0626349892008639


In [None]:
# Display information about the dataset
bobs_burgers_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 272 entries, 1 to 272
Data columns (total 8 columns):
 #   Column              Non-Null Count  Dtype 
---  ------              --------------  ----- 
 0   season              272 non-null    object
 1   episode             272 non-null    object
 2   dialogue_density    272 non-null    object
 3   avg_length          272 non-null    object
 4   sentiment_variance  272 non-null    object
 5   unique_words        272 non-null    object
 6   question_ratio      272 non-null    object
 7   exclamation_ratio   272 non-null    object
dtypes: object(8)
memory usage: 17.1+ KB


In [None]:
# Fill NaN values with 0
bobs_burgers_data = bobs_burgers_data.fillna(0)

In [None]:
# Convert columns to appropriate data types
bobs_burgers_data['season'] = bobs_burgers_data['season'].str.extract('(\d+)', expand=False).astype(int)
bobs_burgers_data['episode'] = bobs_burgers_data['episode'].str.extract('(\d+)', expand=False).astype(int)
bobs_burgers_data['dialogue_density'] = bobs_burgers_data['dialogue_density'].astype(float)
bobs_burgers_data['avg_length'] = bobs_burgers_data['avg_length'].astype(float)
bobs_burgers_data['sentiment_variance'] = bobs_burgers_data['sentiment_variance'].astype(float)
bobs_burgers_data['unique_words'] = bobs_burgers_data['unique_words'].astype(int)
bobs_burgers_data['question_ratio'] = bobs_burgers_data['question_ratio'].astype(float)
bobs_burgers_data['exclamation_ratio'] = bobs_burgers_data['exclamation_ratio'].astype(float)

In [None]:
# Copy data for further processing
transcript_data = bobs_burgers_data.copy()

# Display the first few rows of the copied data
transcript_data.head()

Unnamed: 0,season,episode,dialogue_density,avg_length,sentiment_variance,unique_words,question_ratio,exclamation_ratio
1,1,1,0.930041,37.530973,3.324415,960,0.207965,0.192478
2,1,2,0.993976,33.814141,3.986799,950,0.183838,0.119192
3,1,3,0.992,31.066532,4.075439,915,0.175403,0.09879
4,1,4,0.994371,32.245283,3.708638,892,0.141509,0.111321
5,1,5,0.993562,34.062635,3.778036,888,0.164147,0.062635


In [None]:
# Ensure 'dialogue_density' and 'avg_length' are numeric
transcript_data['dialogue_density'] = pd.to_numeric(transcript_data['dialogue_density'], errors='coerce')
transcript_data['avg_length'] = pd.to_numeric(transcript_data['avg_length'], errors='coerce')

In [None]:
# Function for calculating sentiment variance
def calculate_sentiment_variance(text):
    # This should be replaced with actual sentiment analysis if needed
    sentiment_scores = [len(word) for word in text.split()]
    return np.var(sentiment_scores) if sentiment_scores else np.nan

In [None]:
episode_metrics = transcript_data.dropna(subset=['dialogue_density']).groupby(['season', 'episode'], as_index=False).agg({
    'dialogue_density': 'count',  # Number of dialogue lines
    'avg_length': 'mean',         # Average length of dialogue
    'sentiment_variance': 'mean', # Use mean instead of custom function
    'unique_words': 'sum',        # Sum unique words
    'question_ratio': 'mean',     # Mean of question ratio
    'exclamation_ratio': 'mean'   # Mean of exclamation ratio
})


In [None]:
transcript_data['dialogue_density'] = pd.to_numeric(transcript_data['dialogue_density'], errors='coerce')


In [None]:
scaler = StandardScaler()
episode_metrics[['dialogue_density', 'avg_length', 'sentiment_variance',
                 'unique_words', 'question_ratio', 'exclamation_ratio']] = scaler.fit_transform(
    episode_metrics[['dialogue_density', 'avg_length', 'sentiment_variance',
                     'unique_words', 'question_ratio', 'exclamation_ratio']]
)


In [None]:


## Circle Plot Function

def create_circle_plot(data, season, episode):
    metrics = ['dialogue_density', 'avg_length', 'sentiment_variance',
               'unique_words', 'question_ratio', 'exclamation_ratio']

    episode_data = data[(data['season'] == season) & (data['episode'] == episode)]

    fig = go.Figure()

    fig.add_trace(go.Scatterpolar(
        r=episode_data[metrics].values[0],
        theta=metrics,
        fill='toself',
        name=f'Season {season}, Episode {episode}'
    ))

    fig.update_layout(
        polar=dict(
            radialaxis=dict(
                visible=True,
                range=[-2, 2]
            )),
        showlegend=True,
        title=f"Metrics for Bob's Burgers S{season}E{episode}"
    )

    return fig

## Plot for a Specific Episode

season = 1
episode = 1
fig = create_circle_plot(episode_metrics, season, episode)
fig.show()

## Plot Multiple Episodes

fig = go.Figure()

for season in [1, 2, 3]:
    for episode in [1, 2]:
        episode_data = episode_metrics[(episode_metrics['season'] == season) &
                                       (episode_metrics['episode'] == episode)]

        fig.add_trace(go.Scatterpolar(
            r=episode_data[['dialogue_density', 'avg_length', 'sentiment_variance',
                            'unique_words', 'question_ratio', 'exclamation_ratio']].values[0],
            theta=['dialogue_density', 'avg_length', 'sentiment_variance',
                   'unique_words', 'question_ratio', 'exclamation_ratio'],
            fill='toself',
            name=f'S{season}E{episode}'
        ))

fig.update_layout(
    polar=dict(
        radialaxis=dict(
            visible=True,
            range=[-2, 2]  # Adjust based on your scaled data
        )),
    showlegend=True,
    title="Comparison of Bob's Burgers Episodes"
)

fig.show()
