**The objective of this Exploratory Data Analysis (EDA) is to analyze Netflix content data to understand:**

- The distribution of Movies vs TV Shows
- Trends in content release over the years
- Content ratings and duration patterns
- Missing values and data quality issues
- High-level characteristics of Netflix titles

This analysis will help uncover insights about Netflix’s content strategy and catalog composition.


In [1]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

#**Understanding the Dataset (Basic Inspection)**

In [6]:
#loading the dataset
netflix = pd.read_csv('Netflix.csv')
netflix.head(10)

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
0,s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,,United States,"September 25, 2021",2020,PG-13,90 min,Documentaries,"As her father nears the end of his life, filmm..."
1,s2,TV Show,Blood & Water,,"Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban...",South Africa,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, TV Dramas, TV Mysteries","After crossing paths at a party, a Cape Town t..."
2,s3,TV Show,Ganglands,Julien Leclercq,"Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi...",,"September 24, 2021",2021,TV-MA,1 Season,"Crime TV Shows, International TV Shows, TV Act...",To protect his family from a powerful drug lor...
3,s4,TV Show,Jailbirds New Orleans,,,,"September 24, 2021",2021,TV-MA,1 Season,"Docuseries, Reality TV","Feuds, flirtations and toilet talk go down amo..."
4,s5,TV Show,Kota Factory,,"Mayur More, Jitendra Kumar, Ranjan Raj, Alam K...",India,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, Romantic TV Shows, TV ...",In a city of coaching centers known to train I...
5,s6,TV Show,Midnight Mass,Mike Flanagan,"Kate Siegel, Zach Gilford, Hamish Linklater, H...",,"September 24, 2021",2021,TV-MA,1 Season,"TV Dramas, TV Horror, TV Mysteries",The arrival of a charismatic young priest brin...
6,s7,Movie,My Little Pony: A New Generation,"Robert Cullen, JosÃ© Luis Ucha","Vanessa Hudgens, Kimiko Glenn, James Marsden, ...",,"September 24, 2021",2021,PG,91 min,Children & Family Movies,Equestria's divided. But a bright-eyed hero be...
7,s8,Movie,Sankofa,Haile Gerima,"Kofi Ghanaba, Oyafunmike Ogunlano, Alexandra D...","United States, Ghana, Burkina Faso, United Kin...","September 24, 2021",1993,TV-MA,125 min,"Dramas, Independent Movies, International Movies","On a photo shoot in Ghana, an American model s..."
8,s9,TV Show,The Great British Baking Show,Andy Devonshire,"Mel Giedroyc, Sue Perkins, Mary Berry, Paul Ho...",United Kingdom,"September 24, 2021",2021,TV-14,9 Seasons,"British TV Shows, Reality TV",A talented batch of amateur bakers face off in...
9,s10,Movie,The Starling,Theodore Melfi,"Melissa McCarthy, Chris O'Dowd, Kevin Kline, T...",United States,"September 24, 2021",2021,PG-13,104 min,"Comedies, Dramas",A woman adjusting to life after a loss contend...


#**Datset Overview**

In [21]:
netflix.shape

(8807, 12)

Observation: The dataset contains rows representing Netflix titles and columns describing their attributes

In [22]:
netflix.columns

Index(['show_id', 'type', 'title', 'director', 'cast', 'country', 'date_added',
       'release_year', 'rating', 'duration', 'listed_in', 'description'],
      dtype='object')

Observation: The dataset includes information such as title, type, cast, country, rating, duration, and release year.

#**Data types & null values**

In [34]:
netflix.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8807 entries, 0 to 8806
Data columns (total 12 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   show_id       8807 non-null   object
 1   type          8807 non-null   object
 2   title         8807 non-null   object
 3   director      6173 non-null   object
 4   cast          7982 non-null   object
 5   country       7976 non-null   object
 6   date_added    8797 non-null   object
 7   release_year  8807 non-null   int64 
 8   rating        8803 non-null   object
 9   duration      8804 non-null   object
 10  listed_in     8807 non-null   object
 11  description   8807 non-null   object
dtypes: int64(1), object(11)
memory usage: 825.8+ KB


Observation
- Most columns are of type object
- Columns like director, cast, and country contain missing values.
- release_year is the only columns that is numerical.

**Missing Values count**

In [27]:
netflix.isnull().sum()

Unnamed: 0,0
show_id,0
type,0
title,0
director,2634
cast,825
country,831
date_added,10
release_year,0
rating,4
duration,3


Observation
- director and cast columns have a significant number of missing values.
- Missing values will be handled during the data cleaning stage.

**Duplicated Records**

In [30]:
print(netflix.duplicated().sum())

0


The dataset contains no duplicate rows.

**Distribution of content type: Comparison of tv shows vs. movies**

In [35]:
netflix['type'].value_counts()

Unnamed: 0_level_0,count
type,Unnamed: 1_level_1
Movie,6131
TV Show,2676


Observation: Movies covers a larger portion of Netflix content compared to TV Shows.

**Content release year summary**

In [36]:
netflix['release_year'].describe()

Unnamed: 0,release_year
count,8807.0
mean,2014.180198
std,8.819312
min,1925.0
25%,2013.0
50%,2017.0
75%,2019.0
max,2021.0


Observations:

- Most content has been released in recent years.

- Netflix focuses heavily on modern content.

**Ratinig distribution**

In [37]:
netflix['rating'].value_counts()

Unnamed: 0_level_0,count
rating,Unnamed: 1_level_1
TV-MA,3207
TV-14,2160
TV-PG,863
R,799
PG-13,490
TV-Y7,334
TV-Y,307
PG,287
TV-G,220
NR,80


Observation: TV-MA and TV-14 are the most common ratings, focusing on mature audiences.

**Country-wise content distribution**

In [38]:
netflix['country'].value_counts()

Unnamed: 0_level_0,count
country,Unnamed: 1_level_1
United States,2818
India,972
United Kingdom,419
Japan,245
South Korea,199
...,...
"Mexico, United States, Spain, Colombia",1
"Canada, Norway",1
"Finland, Germany, Belgium",1
"Argentina, United States, Mexico",1


Observation: The United States contributes the highest number of titles.

**Duration overview**

In [39]:
netflix['duration'].value_counts().head(10)


Unnamed: 0_level_0,count
duration,Unnamed: 1_level_1
1 Season,1793
2 Seasons,425
3 Seasons,199
90 min,152
97 min,146
94 min,146
93 min,146
91 min,144
95 min,137
96 min,130


Observation:
- Movies are measured in minutes.
- TV Shows are measured in number of seasons.

**Genre overview**

In [42]:
netflix['listed_in'].value_counts()

Unnamed: 0_level_0,count
listed_in,Unnamed: 1_level_1
"Dramas, International Movies",362
Documentaries,359
Stand-Up Comedy,334
"Comedies, Dramas, International Movies",274
"Dramas, Independent Movies, International Movies",252
...,...
"Action & Adventure, Cult Movies",1
"Action & Adventure, Comedies, Music & Musicals",1
"Classic Movies, Horror Movies, Thrillers",1
"Children & Family Movies, Classic Movies, Dramas",1


Observation: Genres are stored as comma-separated values.

#**Conclusion**
- The dataset contains both Movies and TV Shows, with Movies being watched more.
- Several columns have missing values that need cleaning.
- Many columns contain multiple values in a single cell (cast, country, genres).
- Netflix content is largely focused on recent releases and mature ratings.