Data Source - [Netflix Data](https://www.kaggle.com/datasets/shivamb/netflix-shows)

The purpose of this analysis is to find the following:
1. Data Cleaning & Prep
- Handle missing values (e.g., director, cast, country).
- Convert date_added from string to datetime format.
- Extract useful features (e.g., month/year added, duration in minutes).

2. Exploratory Data Analysis (EDA)
- Content Distribution: Movies vs. TV shows over time.
- Release Trends: When were most shows/movies added to Netflix?
- Country Analysis: Which countries produce the most content?
- Ratings Analysis: What’s the most common rating (TV-MA, PG-13, etc.)?

3. Visualizations (Use Matplotlib/Seaborn or Plotly)
- 📈 Bar Chart: Number of Movies vs. TV Shows by year.
- 🌍 Map Visualization: Countries producing the most content (using geopandas or Plotly).
- 📅 Time Series Plot: Monthly additions of content over the years.
- 📊 Pie Chart: Distribution of ratings (TV-MA, PG-13, etc.).

4. Bonus (If You Want More Challenge)
- Text Analysis: Analyze the description column for common keywords.
- Recommendation System (Basic): Suggest similar content based on genre/director.

In [None]:
# Packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
print('Happy Coding 😊')

In [None]:
data = pd.read_csv('./netflix_titles.csv') # importing the dataset

### Data Cleaning

In [None]:
data.head() # Displaying the first 5 rows of the dataset

In [None]:
data.sample(15) # Displaying 15 random rows of the dataset

In [None]:
data.isnull().sum() # checking for null values

In [None]:
data[(data['director'].isnull()) & (data['type'] != 'TV Show')] # checking for null values in the director column

In [None]:
print(data['type'].unique()) # Displaying unique values in the type column

In [None]:
data.loc[data['duration'].isnull(), 'rating'] = 'TV-MA' # The rating of the TV shows with missing duration is set to TV-MA

In [None]:
data.loc[data['title'] == 'Louis C.K. 2017', 'duration'] = '74 min' # The duration of the movie is set to 74 min
data.loc[data['title'] == 'Louis C.K.: Hilarious', 'duration'] = '84 min' # The duration of the movie is set to 84 min
data.loc[data['title'] == 'Louis C.K.: Live at the Comedy Store', 'duration'] = '66 min' # The duration of the movie is set to 66 min

In [None]:
data['director'] = data['director'].fillna('Unknown') # filling the null values with 'Unknown'
data['country'] = data['country'].fillna('Unknown') # filling the null values with 'Unknown'
data['cast'] = data['cast'].fillna('Unknown') # filling the null values with 'Unknown'

In [None]:
data['rating'] = data['rating'].fillna('Not Rated') # filling the null values with '

In [None]:
data[data['date_added'].isnull()] # checking for null values in the date_added column

In [None]:
data['date_added'] = pd.to_datetime(data['date_added'].str.strip(), format="%B %d, %Y", errors='coerce') # converting the date_added column to datetime format

In [None]:
data['date_added'].dtype # checking the data type of the date_added column

In [None]:
data.isnull().sum() # checking for null values

## Exploratory Data Analysis (EDA)