![netflix_header](netflix_header.png)


# Netflix Top 10: Analyzing Weekly Chart-Toppers

This dataset comprises Netflix's weekly top 10 lists for the most-watched TV shows and films worldwide. The data spans from June 28, 2021, to August 27, 2023.

This workspace is pre-loaded with two CSV files. 
- `netflix_top10.csv` contains columns such as `show_title`, `category`, `weekly_rank`, and several view metrics.
- `netflix_top10_country.csv` has information about a show or film's performance by country, contained in the columns `cumulative_weeks_in_top_10` and `weekly_rank`.

We've added some guiding questions for analyzing this exciting dataset! Feel free to make this workspace yours by adding and removing cells, or editing any of the existing cells. 

[Source: Netflix](https://www.netflix.com/tudum/top10/united-states?week=2023-08-27) 

## Explore this dataset

To get you started with your analysis...
1. Combine the different categories of top 10 lists in a single weekly top 10 list spanning all categories
2. Are there consistent trends or patterns in the content format (tv, film) that make it to the top 10 over different weeks or months?
3. Explore your country's top 10 trends. Are there unique preferences or regional factors that set your country's list apart from others?
4. Visualize popularity ranking over time through time series plots

### 🔍 **Scenario: Understanding the Impact of Content Duration on Netflix's Top 10 Lists**

This scenario helps you develop an end-to-end project for your portfolio.

Background: As a data scientist at Netflix, you're tasked with exploring the dataset containing weekly top 10 lists of the most-watched TV shows and films. For example, you're tasked to find out what the relationship is between duration and ranking over time. Answering this question can inform content creators and strategists on how to optimize their offerings for the platform.

**Objective**: Determine if there's a correlation between content duration and its likelihood of making it to the top 10 lists.

You can query the pre-loaded CSV files using SQL directly. Here’s a **sample query**:

In [3]:
SELECT *
FROM 'netflix_top10_country.csv'
WHERE country_name = 'Argentina'


Unnamed: 0,country_name,country_iso2,week,category,weekly_rank,show_title,season_title,cumulative_weeks_in_top_10
0,Argentina,AR,2023-08-27 00:00:00+00:00,Films,1,On the Line,,1
1,Argentina,AR,2023-08-27 00:00:00+00:00,Films,2,Half Brothers,,2
2,Argentina,AR,2023-08-27 00:00:00+00:00,Films,3,Street Kings,,3
3,Argentina,AR,2023-08-27 00:00:00+00:00,Films,4,You Are So Not Invited to My Bat Mitzvah,,1
4,Argentina,AR,2023-08-27 00:00:00+00:00,Films,5,Heart of Stone,,3
...,...,...,...,...,...,...,...,...
2255,Argentina,AR,2021-07-04 00:00:00+00:00,TV,6,Falsa identidad,Falsa identidad: Season 2,1
2256,Argentina,AR,2021-07-04 00:00:00+00:00,TV,7,"Yo soy Betty, la fea","Yo soy Betty, la fea: Season 1",1
2257,Argentina,AR,2021-07-04 00:00:00+00:00,TV,8,Pokémon Journeys: The Series,Pokémon Journeys: The Series: Season 1,1
2258,Argentina,AR,2021-07-04 00:00:00+00:00,TV,9,Señora Acero,Señora Acero: Season 2,1


In [4]:
import pandas as pd

global_top_10 = pd.read_csv("netflix_top10.csv", index_col=0)
global_top_10

Unnamed: 0_level_0,category,weekly_rank,show_title,season_title,weekly_hours_viewed,runtime,weekly_views,cumulative_weeks_in_top_10,is_staggered_launch,episode_launch_details
week,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
2023-08-27,Films (English),1,The Monkey King,,23200000,1.6167,14400000.0,2,False,
2023-08-27,Films (English),2,Heart of Stone,,28500000,2.1000,13600000.0,3,False,
2023-08-27,Films (English),3,You Are So Not Invited to My Bat Mitzvah,,21300000,1.7333,12300000.0,1,False,
2023-08-27,Films (English),4,Street Kings,,10300000,1.8167,5700000.0,2,False,
2023-08-27,Films (English),5,The Boss Baby,,9000000,1.6333,5500000.0,10,False,
...,...,...,...,...,...,...,...,...,...,...
2021-07-04,TV (Non-English),6,Elite,Elite: Season 1,10530000,,,1,False,
2021-07-04,TV (Non-English),7,Elite,Elite: Season 3,10200000,,,1,False,
2021-07-04,TV (Non-English),8,Elite,Elite: Season 2,10140000,,,1,False,
2021-07-04,TV (Non-English),9,Katla,Katla: Season 1,9190000,,,1,False,


In [None]:
countries_top_10 = pd.read_csv("netflix_top10_country.csv", index_col=0)
countries_top_10.head()

Unnamed: 0_level_0,country_iso2,week,category,weekly_rank,show_title,season_title,cumulative_weeks_in_top_10
country_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Argentina,AR,2023-08-27,Films,1,On the Line,,1
Argentina,AR,2023-08-27,Films,2,Half Brothers,,2
Argentina,AR,2023-08-27,Films,3,Street Kings,,3
Argentina,AR,2023-08-27,Films,4,You Are So Not Invited to My Bat Mitzvah,,1
Argentina,AR,2023-08-27,Films,5,Heart of Stone,,3


## Ready to share your work?

Click "Share" in the upper right corner, copy the link, and share it! You can also add this workspace to your DataCamp Portfolio