# Netflix TV Shows and Movies

This notebook will focus on cleaning and preparing a Netflix dataset from Kaggle for analysis and visualization.


###  Objectives:
- Explore the structure and contents of the raw dataset
- Identify and handle missing, duplicate, or inconsistent data
- Standardize columns such as release date, duration, and genres
- Output a clean, structured dataset for use in Tableau

###  Dataset:
- Source: [Netflix Titles Dataset from Kaggle](https://www.kaggle.com/datasets/shivamb/netflix-shows)
- File: `netflix_titles.csv`

---

 By the end of this notebook, we will generate a clean CSV file (`netflix_cleaned.csv`) ready for visual exploration in Tableau.



In [2]:
import pandas as pd

# Read the data

In [3]:
data=pd.read_csv('netflix_titles.csv')
data.head()

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
0,s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,,United States,"September 25, 2021",2020,PG-13,90 min,Documentaries,"As her father nears the end of his life, filmm..."
1,s2,TV Show,Blood & Water,,"Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban...",South Africa,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, TV Dramas, TV Mysteries","After crossing paths at a party, a Cape Town t..."
2,s3,TV Show,Ganglands,Julien Leclercq,"Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi...",,"September 24, 2021",2021,TV-MA,1 Season,"Crime TV Shows, International TV Shows, TV Act...",To protect his family from a powerful drug lor...
3,s4,TV Show,Jailbirds New Orleans,,,,"September 24, 2021",2021,TV-MA,1 Season,"Docuseries, Reality TV","Feuds, flirtations and toilet talk go down amo..."
4,s5,TV Show,Kota Factory,,"Mayur More, Jitendra Kumar, Ranjan Raj, Alam K...",India,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, Romantic TV Shows, TV ...",In a city of coaching centers known to train I...


# Rename and initial exploration

In [4]:
df=data.copy()

In [5]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8807 entries, 0 to 8806
Data columns (total 12 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   show_id       8807 non-null   object
 1   type          8807 non-null   object
 2   title         8807 non-null   object
 3   director      6173 non-null   object
 4   cast          7982 non-null   object
 5   country       7976 non-null   object
 6   date_added    8797 non-null   object
 7   release_year  8807 non-null   int64 
 8   rating        8803 non-null   object
 9   duration      8804 non-null   object
 10  listed_in     8807 non-null   object
 11  description   8807 non-null   object
dtypes: int64(1), object(11)
memory usage: 825.8+ KB


In [6]:
df.describe()

Unnamed: 0,release_year
count,8807.0
mean,2014.180198
std,8.819312
min,1925.0
25%,2013.0
50%,2017.0
75%,2019.0
max,2021.0


In [7]:
df.nunique()

show_id         8807
type               2
title           8807
director        4528
cast            7692
country          748
date_added      1767
release_year      74
rating            17
duration         220
listed_in        514
description     8775
dtype: int64

# Clean the data

In [8]:
df.isnull().sum()

show_id            0
type               0
title              0
director        2634
cast             825
country          831
date_added        10
release_year       0
rating             4
duration           3
listed_in          0
description        0
dtype: int64

In [9]:
# drop the ten rows from 'date_added'

df = df.dropna(subset=['date_added'])
df.head()

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
0,s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,,United States,"September 25, 2021",2020,PG-13,90 min,Documentaries,"As her father nears the end of his life, filmm..."
1,s2,TV Show,Blood & Water,,"Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban...",South Africa,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, TV Dramas, TV Mysteries","After crossing paths at a party, a Cape Town t..."
2,s3,TV Show,Ganglands,Julien Leclercq,"Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi...",,"September 24, 2021",2021,TV-MA,1 Season,"Crime TV Shows, International TV Shows, TV Act...",To protect his family from a powerful drug lor...
3,s4,TV Show,Jailbirds New Orleans,,,,"September 24, 2021",2021,TV-MA,1 Season,"Docuseries, Reality TV","Feuds, flirtations and toilet talk go down amo..."
4,s5,TV Show,Kota Factory,,"Mayur More, Jitendra Kumar, Ranjan Raj, Alam K...",India,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, Romantic TV Shows, TV ...",In a city of coaching centers known to train I...


In [10]:
# duration column

df['duration'].value_counts().head(10)


duration
1 Season     1793
2 Seasons     421
3 Seasons     198
90 min        152
94 min        146
97 min        146
93 min        146
91 min        144
95 min        137
96 min        130
Name: count, dtype: int64

In [11]:
# split the column to extract and determine if the column has minutes or seasons

df[['duration_value', 'duration_unit']] = df['duration'].str.extract(r'(\d+)\s+(\w+)')

In [12]:
df['duration_value'] = pd.to_numeric(df['duration_value'], errors='coerce')

In [13]:
df.head(3)

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description,duration_value,duration_unit
0,s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,,United States,"September 25, 2021",2020,PG-13,90 min,Documentaries,"As her father nears the end of his life, filmm...",90.0,min
1,s2,TV Show,Blood & Water,,"Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban...",South Africa,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, TV Dramas, TV Mysteries","After crossing paths at a party, a Cape Town t...",2.0,Seasons
2,s3,TV Show,Ganglands,Julien Leclercq,"Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi...",,"September 24, 2021",2021,TV-MA,1 Season,"Crime TV Shows, International TV Shows, TV Act...",To protect his family from a powerful drug lor...,1.0,Season


In [14]:
df.columns

Index(['show_id', 'type', 'title', 'director', 'cast', 'country', 'date_added',
       'release_year', 'rating', 'duration', 'listed_in', 'description',
       'duration_value', 'duration_unit'],
      dtype='object')

In [15]:
# drop columns that would be irrelevant in visualizations or that are not needed

df.drop(['show_id','duration'], axis=1, inplace=True)
df.head(3)


Unnamed: 0,type,title,director,cast,country,date_added,release_year,rating,listed_in,description,duration_value,duration_unit
0,Movie,Dick Johnson Is Dead,Kirsten Johnson,,United States,"September 25, 2021",2020,PG-13,Documentaries,"As her father nears the end of his life, filmm...",90.0,min
1,TV Show,Blood & Water,,"Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban...",South Africa,"September 24, 2021",2021,TV-MA,"International TV Shows, TV Dramas, TV Mysteries","After crossing paths at a party, a Cape Town t...",2.0,Seasons
2,TV Show,Ganglands,Julien Leclercq,"Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi...",,"September 24, 2021",2021,TV-MA,"Crime TV Shows, International TV Shows, TV Act...",To protect his family from a powerful drug lor...,1.0,Season


In [16]:
# strip white spaces

df.columns = df.columns.str.strip()
df.head()


Unnamed: 0,type,title,director,cast,country,date_added,release_year,rating,listed_in,description,duration_value,duration_unit
0,Movie,Dick Johnson Is Dead,Kirsten Johnson,,United States,"September 25, 2021",2020,PG-13,Documentaries,"As her father nears the end of his life, filmm...",90.0,min
1,TV Show,Blood & Water,,"Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban...",South Africa,"September 24, 2021",2021,TV-MA,"International TV Shows, TV Dramas, TV Mysteries","After crossing paths at a party, a Cape Town t...",2.0,Seasons
2,TV Show,Ganglands,Julien Leclercq,"Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi...",,"September 24, 2021",2021,TV-MA,"Crime TV Shows, International TV Shows, TV Act...",To protect his family from a powerful drug lor...,1.0,Season
3,TV Show,Jailbirds New Orleans,,,,"September 24, 2021",2021,TV-MA,"Docuseries, Reality TV","Feuds, flirtations and toilet talk go down amo...",1.0,Season
4,TV Show,Kota Factory,,"Mayur More, Jitendra Kumar, Ranjan Raj, Alam K...",India,"September 24, 2021",2021,TV-MA,"International TV Shows, Romantic TV Shows, TV ...",In a city of coaching centers known to train I...,2.0,Seasons


In [17]:
# check if there are any whitespaces left

[col for col  in df.columns
if col != col.strip()]

[]

In [18]:
df.isnull().sum()

type                 0
title                0
director          2624
cast               825
country            830
date_added           0
release_year         0
rating               4
listed_in            0
description          0
duration_value       3
duration_unit        3
dtype: int64

In [19]:
# explore the 'director' column and take necessary steps

df[df['director'].isnull()]['type'].value_counts()


type
TV Show    2436
Movie       188
Name: count, dtype: int64

In [20]:
df[df['director'].isnull()]['country'].value_counts()


country
United States                                    764
United Kingdom                                   206
Japan                                            159
South Korea                                      150
India                                             82
                                                ... 
Chile, Italy                                       1
Belarus                                            1
United Kingdom, Australia                          1
France, Australia, Germany                         1
United States, France, South Korea, Indonesia      1
Name: count, Length: 188, dtype: int64

Most of the empty rows for 'director' seem to come from TV Shows

In [21]:
# fill the empty rows for 'director' column with 'unknown'

df['director'] = df['director'].fillna('Unknown')

In [22]:
# check 'director' for inconsistent formats

df['director'].value_counts().head(20)


director
Unknown                   2624
Rajiv Chilaka               19
Raúl Campos, Jan Suter      18
Suhas Kadav                 16
Marcus Raboy                16
Jay Karas                   14
Cathy Garcia-Molina         13
Jay Chapman                 12
Youssef Chahine             12
Martin Scorsese             12
Steven Spielberg            11
Don Michael Paul            10
David Dhawan                 9
Yılmaz Erdoğan               8
Quentin Tarantino            8
Lance Bangs                  8
Shannon Hartman              8
Troy Miller                  8
Kunle Afolayan               8
Johnnie To                   8
Name: count, dtype: int64

In [23]:
sorted(df['director'].unique())


['A. L. Vijay',
 'A. Raajdheep',
 'A. Salaam',
 'A.R. Murugadoss',
 'Aadish Keluskar',
 'Aamir Bashir',
 'Aamir Khan',
 'Aanand Rai',
 'Aaron Burns',
 'Aaron Hancox, Michael McNamara',
 'Aaron Hann, Mario Miscione',
 'Aaron Lieber',
 'Aaron Nee, Adam Nee',
 'Aaron Sorkin',
 'Aaron Woodley',
 'Aatmaram Dharne',
 'Abba T. Makama',
 'Abbas Alibhai Burmawalla, Mastan Alibhai Burmawalla',
 'Abbas Mustan',
 'Abbas Tyrewala',
 'Abby Epstein',
 'Abdellatif Kechiche',
 'Abdul Aziz Hashad',
 'Abdulaziz Alshlahei',
 'Abel Ferrara',
 'Abhay Chopra',
 'Abhijeet Deshpande',
 'Abhijit Kokate, Srivinay Salian',
 'Abhijit Panse',
 'Abhinay Deo',
 'Abhishek Chaubey',
 'Abhishek Kapoor',
 'Abhishek Saxena',
 'Abhishek Sharma',
 'Abhishek Varman',
 'Abir Sengupta',
 'Abu Bakr Shawky',
 'Achille Brice',
 'Adam Alleca',
 'Adam B. Stein, Zach Lipovsky',
 'Adam Bhala Lough',
 'Adam Bolt',
 'Adam Collins, Luke Radford',
 'Adam Davis, Jerry Kolber, Trey Nelson, Erich Sturm',
 'Adam Del Giudice',
 'Adam Deyoe',


In [24]:
# ensure consistent formatting

df['director'] = df['director'].str.strip().str.title()


In [25]:
df.isnull().sum()

type                0
title               0
director            0
cast              825
country           830
date_added          0
release_year        0
rating              4
listed_in           0
description         0
duration_value      3
duration_unit       3
dtype: int64

In [26]:
# format the 'cast' column, fill empty rows with 'unknown'

df['cast']=df['cast'].fillna('Unknown')

In [27]:
# make sure there are no whitespaces around names where there are multiple names in a cell

df['cast']=df['cast'].apply(lambda x: ','.join([name.strip() for name in x.split(',')]) if x!='Unknown' else x)

In [28]:
df['cast'].head(10)

0                                              Unknown
1    Ama Qamata,Khosi Ngema,Gail Mabalane,Thabang M...
2    Sami Bouajila,Tracy Gotoas,Samuel Jouy,Nabiha ...
3                                              Unknown
4    Mayur More,Jitendra Kumar,Ranjan Raj,Alam Khan...
5    Kate Siegel,Zach Gilford,Hamish Linklater,Henr...
6    Vanessa Hudgens,Kimiko Glenn,James Marsden,Sof...
7    Kofi Ghanaba,Oyafunmike Ogunlano,Alexandra Dua...
8    Mel Giedroyc,Sue Perkins,Mary Berry,Paul Holly...
9    Melissa McCarthy,Chris O'Dowd,Kevin Kline,Timo...
Name: cast, dtype: object

In [29]:
# find bad spacing

bad_cast_rows = df[df['cast'].apply(lambda x: any(name != name.strip() for name in x.split(',')) if x != 'Unknown' else False)]

print(len(bad_cast_rows))
bad_cast_rows.head()


0


Unnamed: 0,type,title,director,cast,country,date_added,release_year,rating,listed_in,description,duration_value,duration_unit


In [30]:
# check for lowercase entries

df[df['cast'].str.match(r'^[a-z ,]+$') & (df['cast'] != 'Unknown')].head()


Unnamed: 0,type,title,director,cast,country,date_added,release_year,rating,listed_in,description,duration_value,duration_unit


In [31]:
df.isnull().sum()

type                0
title               0
director            0
cast                0
country           830
date_added          0
release_year        0
rating              4
listed_in           0
description         0
duration_value      3
duration_unit       3
dtype: int64

In [32]:
# fill empty 'country' rows with 'unknown'

df['country'] = df['country'].fillna('Unknown')


In [33]:
# trim extra spaces

df['country'] = df['country'].str.strip()


In [34]:
# ensure that all cells are capitalized correctly

df['country'] = df['country'].apply(
    lambda x: ', '.join([c.strip().title() for c in x.split(',')]) if x != 'Unknown' else x
)


In [35]:
# check for inconsistencies

df['country'].value_counts().head(30)


country
United States                    2812
India                             972
Unknown                           830
United Kingdom                    418
Japan                             244
South Korea                       199
Canada                            181
Spain                             145
France                            124
Mexico                            110
Egypt                             106
Turkey                            105
Nigeria                            95
Australia                          86
Taiwan                             81
Indonesia                          79
Brazil                             77
Philippines                        75
United Kingdom, United States      75
United States, Canada              73
Germany                            67
China                              66
Thailand                           61
Argentina                          56
Hong Kong                          53
United States, United Kingdom      47
Ital

In [36]:
#create column for rows with multiple countries

df['country_count'] = df['country'].apply(lambda x: 0 if x == 'Unknown' else len(x.split(',')))


In [37]:
df.head()

Unnamed: 0,type,title,director,cast,country,date_added,release_year,rating,listed_in,description,duration_value,duration_unit,country_count
0,Movie,Dick Johnson Is Dead,Kirsten Johnson,Unknown,United States,"September 25, 2021",2020,PG-13,Documentaries,"As her father nears the end of his life, filmm...",90.0,min,1
1,TV Show,Blood & Water,Unknown,"Ama Qamata,Khosi Ngema,Gail Mabalane,Thabang M...",South Africa,"September 24, 2021",2021,TV-MA,"International TV Shows, TV Dramas, TV Mysteries","After crossing paths at a party, a Cape Town t...",2.0,Seasons,1
2,TV Show,Ganglands,Julien Leclercq,"Sami Bouajila,Tracy Gotoas,Samuel Jouy,Nabiha ...",Unknown,"September 24, 2021",2021,TV-MA,"Crime TV Shows, International TV Shows, TV Act...",To protect his family from a powerful drug lor...,1.0,Season,0
3,TV Show,Jailbirds New Orleans,Unknown,Unknown,Unknown,"September 24, 2021",2021,TV-MA,"Docuseries, Reality TV","Feuds, flirtations and toilet talk go down amo...",1.0,Season,0
4,TV Show,Kota Factory,Unknown,"Mayur More,Jitendra Kumar,Ranjan Raj,Alam Khan...",India,"September 24, 2021",2021,TV-MA,"International TV Shows, Romantic TV Shows, TV ...",In a city of coaching centers known to train I...,2.0,Seasons,1


In [38]:
df.isnull().sum()

type              0
title             0
director          0
cast              0
country           0
date_added        0
release_year      0
rating            4
listed_in         0
description       0
duration_value    3
duration_unit     3
country_count     0
dtype: int64

In [39]:
# drop rows with null values that are left

df.dropna(subset=['rating', 'duration_value', 'duration_unit'], inplace=True)


In [40]:
# convert to datetime format

df['date_added'] = pd.to_datetime(df['date_added'], errors='coerce')


In [41]:
#check for any missing values in date or any broken formats

df[df['date_added'].isna()]


Unnamed: 0,type,title,director,cast,country,date_added,release_year,rating,listed_in,description,duration_value,duration_unit,country_count
6079,TV Show,Abnormal Summit,"Jung-Ah Im, Seung-Uk Jo","Hyun-moo Jun,Si-kyung Sung,Se-yoon Yoo",South Korea,NaT,2017,TV-PG,"International TV Shows, Korean TV Shows, Stand...","Led by a trio of Korean celebs, a multinationa...",2.0,Seasons,1
6177,TV Show,忍者ハットリくん,Unknown,Unknown,Japan,NaT,2012,TV-Y7,"Anime Series, Kids' TV","Hailing from the mountains of Iga, Kanzo Hatto...",2.0,Seasons,1
6213,TV Show,Bad Education,Unknown,"Jack Whitehall,Mathew Horne,Sarah Solemani,Mic...",United Kingdom,NaT,2014,TV-MA,"British TV Shows, TV Comedies","A history teacher at the posh Abbey Grove, Alf...",3.0,Seasons,1
6279,TV Show,Being Mary Jane: The Series,Unknown,"Gabrielle Union,Lisa Vidal,Margaret Avery,Omar...",United States,NaT,2016,TV-14,"Romantic TV Shows, TV Dramas",Ambitious single TV journalist Mary Jane attem...,4.0,Seasons,1
6304,TV Show,"Big Dreams, Small Spaces",Unknown,Monty Don,United Kingdom,NaT,2017,TV-G,"British TV Shows, International TV Shows, Real...",Writer and presenter Monty Don helps England's...,3.0,Seasons,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...
8539,TV Show,The Tudors,Unknown,"Jonathan Rhys Meyers,Henry Cavill,James Frain,...","Ireland, Canada, United States, United Kingdom",NaT,2010,TV-MA,TV Dramas,All the splendor and scandal of England's 16th...,4.0,Seasons,4
8557,TV Show,The West Wing,Unknown,"Martin Sheen,Rob Lowe,Allison Janney,John Spen...",United States,NaT,2005,TV-14,TV Dramas,This powerful political epic chronicles the tr...,7.0,Seasons,1
8684,TV Show,Vroomiz,Unknown,"Joon-seok Song,Jeong-hwa Yang,Sang-hyun Um,So-...",South Korea,NaT,2016,TV-Y,"Kids' TV, Korean TV Shows","For these half-car, half-animal friends, each ...",3.0,Seasons,1
8712,TV Show,Weird Wonders of the World,Unknown,Chris Packham,United Kingdom,NaT,2016,TV-PG,"British TV Shows, Docuseries, Science & Nature TV",From animal oddities and bizarre science to me...,2.0,Seasons,1


In [42]:
# check what is causing the problem in dates

data[data['date_added'].isna()][['date_added']]


Unnamed: 0,date_added
6066,
6174,
6795,
6806,
6901,
7196,
7254,
7406,
7847,
8182,


In [43]:
# the NaT rows were caused by missing values so these rows will be dropped (only keep rows that don't have missing values)

df = df[~df['date_added'].isna()]


In [44]:
# confirm that date column is now clean

print(df['date_added'].isna().sum())  


0


In [45]:
# check for any duplicate rows

print("Original duplicates:", df.duplicated().sum())


Original duplicates: 0


# Create seperate dataframes for movies and TV shows


In [46]:
df_mov = df[df['type'] == 'Movie'].copy()
df_tv = df[df['type'] == 'TV Show'].copy()


In [47]:
# re-check for duplicates

print("Movie duplicates after split:", df_mov.duplicated().sum())
print("TV Show duplicates after split:", df_tv.duplicated().sum())


Movie duplicates after split: 0
TV Show duplicates after split: 0


# Final check for Movies and TV Shows cleanup

In [48]:
# check if all duration values are whole numbers(integers)

is_whole_numbers = df_mov['duration_value'].apply(lambda x: float(x).is_integer()).all()

print(is_whole_numbers)

True


In [49]:
is_whole_numbers = df_tv['duration_value'].apply(lambda x: float(x).is_integer()).all()

print(is_whole_numbers)

True


In [50]:
# change type to integer from float
df_mov['duration_value'] = df_mov['duration_value'].astype(int)

In [51]:

df_tv['duration_value'] = df_tv['duration_value'].astype(int)

In [52]:
df_tv

Unnamed: 0,type,title,director,cast,country,date_added,release_year,rating,listed_in,description,duration_value,duration_unit,country_count
1,TV Show,Blood & Water,Unknown,"Ama Qamata,Khosi Ngema,Gail Mabalane,Thabang M...",South Africa,2021-09-24,2021,TV-MA,"International TV Shows, TV Dramas, TV Mysteries","After crossing paths at a party, a Cape Town t...",2,Seasons,1
2,TV Show,Ganglands,Julien Leclercq,"Sami Bouajila,Tracy Gotoas,Samuel Jouy,Nabiha ...",Unknown,2021-09-24,2021,TV-MA,"Crime TV Shows, International TV Shows, TV Act...",To protect his family from a powerful drug lor...,1,Season,0
3,TV Show,Jailbirds New Orleans,Unknown,Unknown,Unknown,2021-09-24,2021,TV-MA,"Docuseries, Reality TV","Feuds, flirtations and toilet talk go down amo...",1,Season,0
4,TV Show,Kota Factory,Unknown,"Mayur More,Jitendra Kumar,Ranjan Raj,Alam Khan...",India,2021-09-24,2021,TV-MA,"International TV Shows, Romantic TV Shows, TV ...",In a city of coaching centers known to train I...,2,Seasons,1
5,TV Show,Midnight Mass,Mike Flanagan,"Kate Siegel,Zach Gilford,Hamish Linklater,Henr...",Unknown,2021-09-24,2021,TV-MA,"TV Dramas, TV Horror, TV Mysteries",The arrival of a charismatic young priest brin...,1,Season,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...
8795,TV Show,Yu-Gi-Oh! Arc-V,Unknown,"Mike Liscio,Emily Bauer,Billy Bob Thompson,Aly...","Japan, Canada",2018-05-01,2015,TV-Y7,"Anime Series, Kids' TV",Now that he's discovered the Pendulum Summonin...,2,Seasons,2
8796,TV Show,Yunus Emre,Unknown,"Gökhan Atalay,Payidar Tüfekçioglu,Baran Akbulu...",Turkey,2017-01-17,2016,TV-PG,"International TV Shows, TV Dramas","During the Mongol invasions, Yunus Emre leaves...",2,Seasons,1
8797,TV Show,Zak Storm,Unknown,"Michael Johnston,Jessica Gee-George,Christine ...","United States, France, South Korea, Indonesia",2018-09-13,2016,TV-Y7,Kids' TV,Teen surfer Zak Storm is mysteriously transpor...,3,Seasons,4
8800,TV Show,Zindagi Gulzar Hai,Unknown,"Sanam Saeed,Fawad Khan,Ayesha Omer,Mehreen Rah...",Pakistan,2016-12-15,2012,TV-PG,"International TV Shows, Romantic TV Shows, TV ...","Strong-willed, middle-class Kashaf and carefre...",1,Season,1


# Export to .csv files

In [53]:
df_mov.to_csv('movies_data.csv', sep=';', index=False)

In [54]:
df_tv.to_csv('tv_data.csv', sep=';', index=False)

In [58]:
# concat a third csv file to use for combined chart

full_df = pd.concat([df_mov, df_tv], ignore_index=True)


In [59]:
full_df

Unnamed: 0,type,title,director,cast,country,date_added,release_year,rating,listed_in,description,duration_value,duration_unit,country_count
0,Movie,Dick Johnson Is Dead,Kirsten Johnson,Unknown,United States,2021-09-25,2020,PG-13,Documentaries,"As her father nears the end of his life, filmm...",90,min,1
1,Movie,My Little Pony: A New Generation,"Robert Cullen, José Luis Ucha","Vanessa Hudgens,Kimiko Glenn,James Marsden,Sof...",Unknown,2021-09-24,2021,PG,Children & Family Movies,Equestria's divided. But a bright-eyed hero be...,91,min,0
2,Movie,Sankofa,Haile Gerima,"Kofi Ghanaba,Oyafunmike Ogunlano,Alexandra Dua...","United States, Ghana, Burkina Faso, United Kin...",2021-09-24,1993,TV-MA,"Dramas, Independent Movies, International Movies","On a photo shoot in Ghana, an American model s...",125,min,6
3,Movie,The Starling,Theodore Melfi,"Melissa McCarthy,Chris O'Dowd,Kevin Kline,Timo...",United States,2021-09-24,2021,PG-13,"Comedies, Dramas",A woman adjusting to life after a loss contend...,104,min,1
4,Movie,Je Suis Karl,Christian Schwochow,"Luna Wedler,Jannis Niewöhner,Milan Peschel,Edi...","Germany, Czech Republic",2021-09-23,2021,TV-MA,"Dramas, International Movies",After most of her family is murdered in a terr...,127,min,2
...,...,...,...,...,...,...,...,...,...,...,...,...,...
8697,TV Show,Yu-Gi-Oh! Arc-V,Unknown,"Mike Liscio,Emily Bauer,Billy Bob Thompson,Aly...","Japan, Canada",2018-05-01,2015,TV-Y7,"Anime Series, Kids' TV",Now that he's discovered the Pendulum Summonin...,2,Seasons,2
8698,TV Show,Yunus Emre,Unknown,"Gökhan Atalay,Payidar Tüfekçioglu,Baran Akbulu...",Turkey,2017-01-17,2016,TV-PG,"International TV Shows, TV Dramas","During the Mongol invasions, Yunus Emre leaves...",2,Seasons,1
8699,TV Show,Zak Storm,Unknown,"Michael Johnston,Jessica Gee-George,Christine ...","United States, France, South Korea, Indonesia",2018-09-13,2016,TV-Y7,Kids' TV,Teen surfer Zak Storm is mysteriously transpor...,3,Seasons,4
8700,TV Show,Zindagi Gulzar Hai,Unknown,"Sanam Saeed,Fawad Khan,Ayesha Omer,Mehreen Rah...",Pakistan,2016-12-15,2012,TV-PG,"International TV Shows, Romantic TV Shows, TV ...","Strong-willed, middle-class Kashaf and carefre...",1,Season,1


In [60]:
full_df.to_csv('netflix_data.csv', sep=';', index=False)