In [None]:
# import libraries
import pandas as pd
import numpy as np
import matplotlib as plt
import streamlit as st

# load datasets 
seasons = pd.read_csv('datasets/seasons.csv')
episodes = pd.read_csv('datasets/episodes.csv')

Empty DataFrame
Columns: [season_title, season_num, number_of_episodes, air_date_first_ep, air_date_last_ep, producer, IMDB_rating]
Index: []


## Data Preparation
With the datasets loaded, the next step is to prepare the data for analysis and visualization.
### Merging Datasets
The datasets were merged using the `season_title` column as the common key. After merging, the `IMDB_rating_x` and `IMDB_rating_y` columns were renamed to `IMDB_rating_season` and `IMDB_rating_episode`, respectively, to clearly distinguish between the ratings for the season and the individual episodes. 

Following the merge, the data was inspected for any null or missing (`NA`) values. The merge was successful, and no null or missing values were identified.

Additionally, duplicates were checked in the merged dataset based on the combination of `season_title` and `episode_num` to ensure that each episode appears only once per season.
### Standardizing Date Formats
To ensure consistency in date representation, a uniform date-time format was created across the dataset. In the `seasons.csv` dataset, the `air_date_first_ep` and `air_date_last_ep` columns were originally formatted as `YYYY-MM-DD`. In contrast, the `episodes.csv` dataset used the `MM/DD/YYYY` format for dates. These discrepancies were resolved by standardizing all date columns to the `YYYY-MM-DD` format.

In [76]:
# Merge the datasets 
pr_merged_data = seasons.merge(episodes, on='season_title')

# Rename IMDB_rating_x and IMBD_rating_y
pr_merged_data.rename(columns={'IMDB_rating_x':'IMDB_rating_season', 'IMDB_rating_y':'IMDB_rating_episode'}, inplace=True)

# Check for any na values
pd.isna(pr_merged_data).sum()
pd.isnull(pr_merged_data).sum()

# Check for any duplicates
pr_merged_duplicates = pr_merged_data[pr_merged_data.duplicated(subset=['season_title', 'episode_num'], keep=False)]

# Create consistent airtime data
pr_merged_data[['air_date', 'air_date_first_ep', 'air_date_last_ep']] = pr_merged_data[['air_date', 'air_date_first_ep', 'air_date_last_ep']].apply(pd.to_datetime, errors='coerce')

pr_merged_data.head()

Unnamed: 0,season_title,season_num,number_of_episodes,air_date_first_ep,air_date_last_ep,producer,IMDB_rating_season,episode_num,episode_title,air_date,IMDB_rating_episode,total_votes,desc
0,Mighty Morphin (Season 1),1,60,1993-08-28,1994-05-23,Saban,6.5,0,The Lost Episode,1999-05-22,6.7,113,Original version of the premiere episode.
1,Mighty Morphin (Season 1),1,60,1993-08-28,1994-05-23,Saban,6.5,1,Day of the Dumpster,1993-08-28,7.4,687,Following the accidental release of long-impri...
2,Mighty Morphin (Season 1),1,60,1993-08-28,1994-05-23,Saban,6.5,2,High Five,1993-09-04,6.9,564,Rita plans to trap the Rangers in a time trap ...
3,Mighty Morphin (Season 1),1,60,1993-08-28,1994-05-23,Saban,6.5,3,Teamwork,1993-09-08,7.3,546,Trini and Kimberly set up a petition to clean ...
4,Mighty Morphin (Season 1),1,60,1993-08-28,1994-05-23,Saban,6.5,4,A Pressing Engagement,1993-09-09,6.9,535,Jason is trying to break the bench press recor...


Conduct EDA

