# **Netflix Content Analysis using Python and SQL**  

**Objective:** To analyze the Netflix content library using SQL queries and visualize insights using pandas and matplotlib. This project combines SQL and Python into a single narrative to draw actionable insights from data.


##  Dataset Overview

The data has already been cleaned and prepared.  
- Source: `netflix_titles.csv` (output of **data_clean_and_prep.ipynb**)  
- Key columns: `title`, `type`, `release_year`, `rating`, `country`, `listed_in`, `date_added`, `duration`


## Let's Import required libraries and begin ...

In [2]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import warnings

In [3]:
new_df=pd.read_csv('netflix_titles.csv')

#### Now we wil analyze dataset with some basic commands

In [4]:
new_df.head()

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
0,s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,,United States,"September 25, 2021",2020,PG-13,90 min,Documentaries,"As her father nears the end of his life, filmm..."
1,s2,TV Show,Blood & Water,,"Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban...",South Africa,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, TV Dramas, TV Mysteries","After crossing paths at a party, a Cape Town t..."
2,s3,TV Show,Ganglands,Julien Leclercq,"Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi...",,"September 24, 2021",2021,TV-MA,1 Season,"Crime TV Shows, International TV Shows, TV Act...",To protect his family from a powerful drug lor...
3,s4,TV Show,Jailbirds New Orleans,,,,"September 24, 2021",2021,TV-MA,1 Season,"Docuseries, Reality TV","Feuds, flirtations and toilet talk go down amo..."
4,s5,TV Show,Kota Factory,,"Mayur More, Jitendra Kumar, Ranjan Raj, Alam K...",India,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, Romantic TV Shows, TV ...",In a city of coaching centers known to train I...


In [5]:
new_df.isnull().sum()

show_id            0
type               0
title              0
director        2634
cast             825
country          831
date_added        10
release_year       0
rating             4
duration           3
listed_in          0
description        0
dtype: int64

In [6]:
new_df.shape

(8807, 12)

In [7]:
new_df.dtypes

show_id         object
type            object
title           object
director        object
cast            object
country         object
date_added      object
release_year     int64
rating          object
duration        object
listed_in       object
description     object
dtype: object

### Now we will clean up the dataset to remove irrelevant/messy data like:

1. Null values (missing data)

2. Duplicates

3. Blank strings or inconsistent text

4. Unusable entries (e.g. entries without title/type)

In [8]:
new_df.duplicated().sum()

np.int64(0)

In [9]:
new_df.drop_duplicates()  #seems no duplicates

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
0,s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,,United States,"September 25, 2021",2020,PG-13,90 min,Documentaries,"As her father nears the end of his life, filmm..."
1,s2,TV Show,Blood & Water,,"Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban...",South Africa,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, TV Dramas, TV Mysteries","After crossing paths at a party, a Cape Town t..."
2,s3,TV Show,Ganglands,Julien Leclercq,"Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi...",,"September 24, 2021",2021,TV-MA,1 Season,"Crime TV Shows, International TV Shows, TV Act...",To protect his family from a powerful drug lor...
3,s4,TV Show,Jailbirds New Orleans,,,,"September 24, 2021",2021,TV-MA,1 Season,"Docuseries, Reality TV","Feuds, flirtations and toilet talk go down amo..."
4,s5,TV Show,Kota Factory,,"Mayur More, Jitendra Kumar, Ranjan Raj, Alam K...",India,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, Romantic TV Shows, TV ...",In a city of coaching centers known to train I...
...,...,...,...,...,...,...,...,...,...,...,...,...
8802,s8803,Movie,Zodiac,David Fincher,"Mark Ruffalo, Jake Gyllenhaal, Robert Downey J...",United States,"November 20, 2019",2007,R,158 min,"Cult Movies, Dramas, Thrillers","A political cartoonist, a crime reporter and a..."
8803,s8804,TV Show,Zombie Dumb,,,,"July 1, 2019",2018,TV-Y7,2 Seasons,"Kids' TV, Korean TV Shows, TV Comedies","While living alone in a spooky town, a young g..."
8804,s8805,Movie,Zombieland,Ruben Fleischer,"Jesse Eisenberg, Woody Harrelson, Emma Stone, ...",United States,"November 1, 2019",2009,R,88 min,"Comedies, Horror Movies",Looking to survive in a world taken over by zo...
8805,s8806,Movie,Zoom,Peter Hewitt,"Tim Allen, Courteney Cox, Chevy Chase, Kate Ma...",United States,"January 11, 2020",2006,PG,88 min,"Children & Family Movies, Comedies","Dragged from civilian life, a former superhero..."


In [10]:
new_df['director'].unique()

array(['Kirsten Johnson', nan, 'Julien Leclercq', ..., 'Majid Al Ansari',
       'Peter Hewitt', 'Mozez Singh'], shape=(4529,), dtype=object)

In [11]:
# Fill missing values safely
new_df['director'] = new_df['director'].replace([np.nan, '', None], 'unknown')
new_df['cast'] = new_df['cast'].replace([np.nan, '', None], 'Not listed')
new_df['country'] = new_df['country'].replace([np.nan, '', None], 'various')

# Handle rating
mode_rating = new_df['rating'].mode()
new_df['rating'] = new_df['rating'].fillna(mode_rating[0] if not mode_rating.empty else 'Unrated')

# Verifying changes
new_df[['director', 'cast', 'country', 'rating']]

Unnamed: 0,director,cast,country,rating
0,Kirsten Johnson,Not listed,United States,PG-13
1,unknown,"Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban...",South Africa,TV-MA
2,Julien Leclercq,"Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi...",various,TV-MA
3,unknown,Not listed,various,TV-MA
4,unknown,"Mayur More, Jitendra Kumar, Ranjan Raj, Alam K...",India,TV-MA
...,...,...,...,...
8802,David Fincher,"Mark Ruffalo, Jake Gyllenhaal, Robert Downey J...",United States,R
8803,unknown,Not listed,various,TV-Y7
8804,Ruben Fleischer,"Jesse Eisenberg, Woody Harrelson, Emma Stone, ...",United States,R
8805,Peter Hewitt,"Tim Allen, Courteney Cox, Chevy Chase, Kate Ma...",United States,PG


In [12]:
new_df = new_df.dropna(subset=['date_added', 'duration'])
new_df.isnull().sum()

show_id         0
type            0
title           0
director        0
cast            0
country         0
date_added      0
release_year    0
rating          0
duration        0
listed_in       0
description     0
dtype: int64

In [13]:
new_df.drop(columns=['show_id'])

Unnamed: 0,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
0,Movie,Dick Johnson Is Dead,Kirsten Johnson,Not listed,United States,"September 25, 2021",2020,PG-13,90 min,Documentaries,"As her father nears the end of his life, filmm..."
1,TV Show,Blood & Water,unknown,"Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban...",South Africa,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, TV Dramas, TV Mysteries","After crossing paths at a party, a Cape Town t..."
2,TV Show,Ganglands,Julien Leclercq,"Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi...",various,"September 24, 2021",2021,TV-MA,1 Season,"Crime TV Shows, International TV Shows, TV Act...",To protect his family from a powerful drug lor...
3,TV Show,Jailbirds New Orleans,unknown,Not listed,various,"September 24, 2021",2021,TV-MA,1 Season,"Docuseries, Reality TV","Feuds, flirtations and toilet talk go down amo..."
4,TV Show,Kota Factory,unknown,"Mayur More, Jitendra Kumar, Ranjan Raj, Alam K...",India,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, Romantic TV Shows, TV ...",In a city of coaching centers known to train I...
...,...,...,...,...,...,...,...,...,...,...,...
8802,Movie,Zodiac,David Fincher,"Mark Ruffalo, Jake Gyllenhaal, Robert Downey J...",United States,"November 20, 2019",2007,R,158 min,"Cult Movies, Dramas, Thrillers","A political cartoonist, a crime reporter and a..."
8803,TV Show,Zombie Dumb,unknown,Not listed,various,"July 1, 2019",2018,TV-Y7,2 Seasons,"Kids' TV, Korean TV Shows, TV Comedies","While living alone in a spooky town, a young g..."
8804,Movie,Zombieland,Ruben Fleischer,"Jesse Eisenberg, Woody Harrelson, Emma Stone, ...",United States,"November 1, 2019",2009,R,88 min,"Comedies, Horror Movies",Looking to survive in a world taken over by zo...
8805,Movie,Zoom,Peter Hewitt,"Tim Allen, Courteney Cox, Chevy Chase, Kate Ma...",United States,"January 11, 2020",2006,PG,88 min,"Children & Family Movies, Comedies","Dragged from civilian life, a former superhero..."


In [14]:
col_names=['title','director','cast','date_added','listed_in','rating']
for col in col_names:
    new_df[col]=new_df[col].str.strip()

new_df

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
0,s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,Not listed,United States,"September 25, 2021",2020,PG-13,90 min,Documentaries,"As her father nears the end of his life, filmm..."
1,s2,TV Show,Blood & Water,unknown,"Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban...",South Africa,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, TV Dramas, TV Mysteries","After crossing paths at a party, a Cape Town t..."
2,s3,TV Show,Ganglands,Julien Leclercq,"Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi...",various,"September 24, 2021",2021,TV-MA,1 Season,"Crime TV Shows, International TV Shows, TV Act...",To protect his family from a powerful drug lor...
3,s4,TV Show,Jailbirds New Orleans,unknown,Not listed,various,"September 24, 2021",2021,TV-MA,1 Season,"Docuseries, Reality TV","Feuds, flirtations and toilet talk go down amo..."
4,s5,TV Show,Kota Factory,unknown,"Mayur More, Jitendra Kumar, Ranjan Raj, Alam K...",India,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, Romantic TV Shows, TV ...",In a city of coaching centers known to train I...
...,...,...,...,...,...,...,...,...,...,...,...,...
8802,s8803,Movie,Zodiac,David Fincher,"Mark Ruffalo, Jake Gyllenhaal, Robert Downey J...",United States,"November 20, 2019",2007,R,158 min,"Cult Movies, Dramas, Thrillers","A political cartoonist, a crime reporter and a..."
8803,s8804,TV Show,Zombie Dumb,unknown,Not listed,various,"July 1, 2019",2018,TV-Y7,2 Seasons,"Kids' TV, Korean TV Shows, TV Comedies","While living alone in a spooky town, a young g..."
8804,s8805,Movie,Zombieland,Ruben Fleischer,"Jesse Eisenberg, Woody Harrelson, Emma Stone, ...",United States,"November 1, 2019",2009,R,88 min,"Comedies, Horror Movies",Looking to survive in a world taken over by zo...
8805,s8806,Movie,Zoom,Peter Hewitt,"Tim Allen, Courteney Cox, Chevy Chase, Kate Ma...",United States,"January 11, 2020",2006,PG,88 min,"Children & Family Movies, Comedies","Dragged from civilian life, a former superhero..."


In [15]:
new_df['date_added']=pd.to_datetime(new_df['date_added'],errors='coerce')
print(new_df['date_added'].dtype)
print(new_df['date_added'].isna().sum()) 

datetime64[ns]
0


In [17]:
new_df['year_added'] = new_df['date_added'].dt.year
new_df['month_added'] = new_df['date_added'].dt.month
new_df['day_of_week'] = new_df['date_added'].dt.day_name()
new_df.drop(['date_added','show_id'],axis=1 ,inplace=True)

In [18]:
new_df['duration'].unique() # fine unique values in date_

array(['90 min', '2 Seasons', '1 Season', '91 min', '125 min',
       '9 Seasons', '104 min', '127 min', '4 Seasons', '67 min', '94 min',
       '5 Seasons', '161 min', '61 min', '166 min', '147 min', '103 min',
       '97 min', '106 min', '111 min', '3 Seasons', '110 min', '105 min',
       '96 min', '124 min', '116 min', '98 min', '23 min', '115 min',
       '122 min', '99 min', '88 min', '100 min', '6 Seasons', '102 min',
       '93 min', '95 min', '85 min', '83 min', '113 min', '13 min',
       '182 min', '48 min', '145 min', '87 min', '92 min', '80 min',
       '117 min', '128 min', '119 min', '143 min', '114 min', '118 min',
       '108 min', '63 min', '121 min', '142 min', '154 min', '120 min',
       '82 min', '109 min', '101 min', '86 min', '229 min', '76 min',
       '89 min', '156 min', '112 min', '107 min', '129 min', '135 min',
       '136 min', '165 min', '150 min', '133 min', '70 min', '84 min',
       '140 min', '78 min', '7 Seasons', '64 min', '59 min', '139 min',
    

In [20]:
new_df['duration_type'] = new_df['duration'].str.extract(r'([a-zA-Z]+)')  #make duration type column
new_df['duration_int']=new_df['duration'].str.extract(r'(\d+)').astype(float)  #make duration int column

In [21]:
new_df.head(5)

Unnamed: 0,type,title,director,cast,country,release_year,rating,duration,listed_in,description,year_added,month_added,day_of_week,duration_type,duration_int
0,Movie,Dick Johnson Is Dead,Kirsten Johnson,Not listed,United States,2020,PG-13,90 min,Documentaries,"As her father nears the end of his life, filmm...",2021,9,Saturday,min,90.0
1,TV Show,Blood & Water,unknown,"Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban...",South Africa,2021,TV-MA,2 Seasons,"International TV Shows, TV Dramas, TV Mysteries","After crossing paths at a party, a Cape Town t...",2021,9,Friday,Seasons,2.0
2,TV Show,Ganglands,Julien Leclercq,"Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi...",various,2021,TV-MA,1 Season,"Crime TV Shows, International TV Shows, TV Act...",To protect his family from a powerful drug lor...,2021,9,Friday,Season,1.0
3,TV Show,Jailbirds New Orleans,unknown,Not listed,various,2021,TV-MA,1 Season,"Docuseries, Reality TV","Feuds, flirtations and toilet talk go down amo...",2021,9,Friday,Season,1.0
4,TV Show,Kota Factory,unknown,"Mayur More, Jitendra Kumar, Ranjan Raj, Alam K...",India,2021,TV-MA,2 Seasons,"International TV Shows, Romantic TV Shows, TV ...",In a city of coaching centers known to train I...,2021,9,Friday,Seasons,2.0


In [22]:
new_df['type'].value_counts()

type
Movie      6128
TV Show    2666
Name: count, dtype: int64

In [23]:
new_df['rating'].value_counts(dropna=False)
new_df['rating'] = new_df['rating'].fillna('Unknown')

In [24]:
new_df['country'] = new_df['country'].fillna('Unknown')

# Extract first country
new_df['main_country'] = new_df['country'].apply(lambda x: x.split(',')[0])
new_df.head()

Unnamed: 0,type,title,director,cast,country,release_year,rating,duration,listed_in,description,year_added,month_added,day_of_week,duration_type,duration_int,main_country
0,Movie,Dick Johnson Is Dead,Kirsten Johnson,Not listed,United States,2020,PG-13,90 min,Documentaries,"As her father nears the end of his life, filmm...",2021,9,Saturday,min,90.0,United States
1,TV Show,Blood & Water,unknown,"Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban...",South Africa,2021,TV-MA,2 Seasons,"International TV Shows, TV Dramas, TV Mysteries","After crossing paths at a party, a Cape Town t...",2021,9,Friday,Seasons,2.0,South Africa
2,TV Show,Ganglands,Julien Leclercq,"Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi...",various,2021,TV-MA,1 Season,"Crime TV Shows, International TV Shows, TV Act...",To protect his family from a powerful drug lor...,2021,9,Friday,Season,1.0,various
3,TV Show,Jailbirds New Orleans,unknown,Not listed,various,2021,TV-MA,1 Season,"Docuseries, Reality TV","Feuds, flirtations and toilet talk go down amo...",2021,9,Friday,Season,1.0,various
4,TV Show,Kota Factory,unknown,"Mayur More, Jitendra Kumar, Ranjan Raj, Alam K...",India,2021,TV-MA,2 Seasons,"International TV Shows, Romantic TV Shows, TV ...",In a city of coaching centers known to train I...,2021,9,Friday,Seasons,2.0,India


In [25]:
#top 10 actors
actors_list=[]
cast_series = new_df[new_df['cast'] != 'unknown']['cast'] # make new series with cast actors names excluding unknown values
for i in cast_series:
    actors=i.split(', ')
    actors_list.extend(actors)

actor_freq=pd.Series(actors_list).value_counts()
actor_freq.head(10)

Not listed          825
Anupam Kher          43
Shah Rukh Khan       35
Julie Tejwani        33
Takahiro Sakurai     32
Naseeruddin Shah     32
Rupa Bhimani         31
Akshay Kumar         30
Om Puri              30
Yuki Kaji            29
Name: count, dtype: int64

In [26]:
new_df['main_genre'] = new_df['listed_in'].apply(lambda x: x.split(',')[0])
new_df.head(5)

Unnamed: 0,type,title,director,cast,country,release_year,rating,duration,listed_in,description,year_added,month_added,day_of_week,duration_type,duration_int,main_country,main_genre
0,Movie,Dick Johnson Is Dead,Kirsten Johnson,Not listed,United States,2020,PG-13,90 min,Documentaries,"As her father nears the end of his life, filmm...",2021,9,Saturday,min,90.0,United States,Documentaries
1,TV Show,Blood & Water,unknown,"Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban...",South Africa,2021,TV-MA,2 Seasons,"International TV Shows, TV Dramas, TV Mysteries","After crossing paths at a party, a Cape Town t...",2021,9,Friday,Seasons,2.0,South Africa,International TV Shows
2,TV Show,Ganglands,Julien Leclercq,"Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi...",various,2021,TV-MA,1 Season,"Crime TV Shows, International TV Shows, TV Act...",To protect his family from a powerful drug lor...,2021,9,Friday,Season,1.0,various,Crime TV Shows
3,TV Show,Jailbirds New Orleans,unknown,Not listed,various,2021,TV-MA,1 Season,"Docuseries, Reality TV","Feuds, flirtations and toilet talk go down amo...",2021,9,Friday,Season,1.0,various,Docuseries
4,TV Show,Kota Factory,unknown,"Mayur More, Jitendra Kumar, Ranjan Raj, Alam K...",India,2021,TV-MA,2 Seasons,"International TV Shows, Romantic TV Shows, TV ...",In a city of coaching centers known to train I...,2021,9,Friday,Seasons,2.0,India,International TV Shows


In [28]:
cols_order=['title','type','main_genre','listed_in','rating','main_country','release_year','year_added','month_added','day_of_week','duration_int','duration_type','director','cast','description']
new_df=new_df[cols_order]    #reordering columns
new_df.head(10)

Unnamed: 0,title,type,main_genre,listed_in,rating,main_country,release_year,year_added,month_added,day_of_week,duration_int,duration_type,director,cast,description
0,Dick Johnson Is Dead,Movie,Documentaries,Documentaries,PG-13,United States,2020,2021,9,Saturday,90.0,min,Kirsten Johnson,Not listed,"As her father nears the end of his life, filmm..."
1,Blood & Water,TV Show,International TV Shows,"International TV Shows, TV Dramas, TV Mysteries",TV-MA,South Africa,2021,2021,9,Friday,2.0,Seasons,unknown,"Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban...","After crossing paths at a party, a Cape Town t..."
2,Ganglands,TV Show,Crime TV Shows,"Crime TV Shows, International TV Shows, TV Act...",TV-MA,various,2021,2021,9,Friday,1.0,Season,Julien Leclercq,"Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi...",To protect his family from a powerful drug lor...
3,Jailbirds New Orleans,TV Show,Docuseries,"Docuseries, Reality TV",TV-MA,various,2021,2021,9,Friday,1.0,Season,unknown,Not listed,"Feuds, flirtations and toilet talk go down amo..."
4,Kota Factory,TV Show,International TV Shows,"International TV Shows, Romantic TV Shows, TV ...",TV-MA,India,2021,2021,9,Friday,2.0,Seasons,unknown,"Mayur More, Jitendra Kumar, Ranjan Raj, Alam K...",In a city of coaching centers known to train I...
5,Midnight Mass,TV Show,TV Dramas,"TV Dramas, TV Horror, TV Mysteries",TV-MA,various,2021,2021,9,Friday,1.0,Season,Mike Flanagan,"Kate Siegel, Zach Gilford, Hamish Linklater, H...",The arrival of a charismatic young priest brin...
6,My Little Pony: A New Generation,Movie,Children & Family Movies,Children & Family Movies,PG,various,2021,2021,9,Friday,91.0,min,"Robert Cullen, José Luis Ucha","Vanessa Hudgens, Kimiko Glenn, James Marsden, ...",Equestria's divided. But a bright-eyed hero be...
7,Sankofa,Movie,Dramas,"Dramas, Independent Movies, International Movies",TV-MA,United States,1993,2021,9,Friday,125.0,min,Haile Gerima,"Kofi Ghanaba, Oyafunmike Ogunlano, Alexandra D...","On a photo shoot in Ghana, an American model s..."
8,The Great British Baking Show,TV Show,British TV Shows,"British TV Shows, Reality TV",TV-14,United Kingdom,2021,2021,9,Friday,9.0,Seasons,Andy Devonshire,"Mel Giedroyc, Sue Perkins, Mary Berry, Paul Ho...",A talented batch of amateur bakers face off in...
9,The Starling,Movie,Comedies,"Comedies, Dramas",PG-13,United States,2021,2021,9,Friday,104.0,min,Theodore Melfi,"Melissa McCarthy, Chris O'Dowd, Kevin Kline, T...",A woman adjusting to life after a loss contend...


The cleaned dataset was persisted to disk using Python's `to_csv()` method:  
`new_df.to_csv('netflix_updated.csv')`  
This ensures reproducibility of our analysis.

### Now when dataset is cleaned up ,now we will begin real analysis using SQL commands and Plotting visuals ...