# Netflix Shows

In this workbook we'll be anazlying trends between Netflix shows. We have a data set from Kaggle that has a plethora of information such as the actors, the director, the overall rating, what genre it is, and so on. Our goal is to

<u>Base Questions</u>
* Understanding what content is available in different countries
* Understanding how the number of TV shows and Movies on netflix has changed over time

<u>Deeper Questions</u>
* See if we can predict the rating of any particular movie given its details (genre, actors, etc)

>[Import the Data](#Import) <br>
[Initial Investingation](#Inv) <br>
[Cleaning](#Cleaning)

<a id='Import'></a>
## Import the Data

In [27]:
import pandas as pd
import numpy as np
import requests, io
from zipfile import ZipFile


> To import the zip file into my directory I used Kaggles API and searched for the dataset using the command line. A quick search (`kaggle datasets list -s netflix`) revealed that the dataset I wanted was stored under `shivamb/netflix-shows`. Thus all I had to do to download the zip file to the directory I wanted was to type `kaggle datasets download shivamb/netflix-shows` to get the zip file downloaded in the directory

In [37]:
# Name of the zip file was found by inspecting the elements in our folder
local_zipfile = "netflix-shows.zip"

# Saving it to a new folder location with the name "netflix-shows"
with ZipFile(local_zipfile, 'r') as zipObj:
   # Extract all the contents of zip file in different directory
   zipObj.extractall(local_zipfile[:-4]) 

> Now we have the data from the zip file extracted and saved in a location called `netflix-shows`. There is one csv file in this folder called `netflix_titles_nov_2019.csv`, let's turn this into a data frame so we can do some initial investigation into the data

In [43]:
df = pd.read_csv('netflix-shows/netflix_titles_nov_2019.csv')
df.head()

Unnamed: 0,show_id,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description,type
0,81193313,Chocolate,,"Ha Ji-won, Yoon Kye-sang, Jang Seung-jo, Kang ...",South Korea,"November 30, 2019",2019,TV-14,1 Season,"International TV Shows, Korean TV Shows, Roman...",Brought together by meaningful meals in the pa...,TV Show
1,81197050,Guatemala: Heart of the Mayan World,"Luis Ara, Ignacio Jaunsolo",Christian Morales,,"November 30, 2019",2019,TV-G,67 min,"Documentaries, International Movies","From Sierra de las Minas to Esquipulas, explor...",Movie
2,81213894,The Zoya Factor,Abhishek Sharma,"Sonam Kapoor, Dulquer Salmaan, Sanjay Kapoor, ...",India,"November 30, 2019",2019,TV-14,135 min,"Comedies, Dramas, International Movies",A goofy copywriter unwittingly convinces the I...,Movie
3,81082007,Atlantics,Mati Diop,"Mama Sane, Amadou Mbow, Ibrahima Traore, Nicol...","France, Senegal, Belgium","November 29, 2019",2019,TV-14,106 min,"Dramas, Independent Movies, International Movies","Arranged to marry a rich man, young Ada is cru...",Movie
4,80213643,Chip and Potato,,"Abigail Oliver, Andrea Libman, Briana Buckmast...","Canada, United Kingdom",,2019,TV-Y,2 Seasons,Kids' TV,"Lovable pug Chip starts kindergarten, makes ne...",TV Show


<a id ='Inv'></a>
## Initial Investingation

In [47]:
df.shape #relatively small dataset

(5837, 12)

In [48]:
df.info() #Not too much missing info

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5837 entries, 0 to 5836
Data columns (total 12 columns):
show_id         5837 non-null int64
title           5837 non-null object
director        3936 non-null object
cast            5281 non-null object
country         5410 non-null object
date_added      5195 non-null object
release_year    5837 non-null int64
rating          5827 non-null object
duration        5837 non-null object
listed_in       5837 non-null object
description     5837 non-null object
type            5837 non-null object
dtypes: int64(2), object(10)
memory usage: 547.3+ KB


<a id='Cleaning'></a>
## Cleaning