`Investigating Netflix Movies with Python`

`September 2025`

This project involves analysing a dataset of Netflix movies to uncover insights about movie durations and genres in the 1990s. The analysis includes filtering the dataset, calculating the most frequent movie duration, and counting specific types of movies.

`Any questions, please reach out!`

Chiawei Wang PhD\
Product & Data Analyst\
<chiawei.w@outlook.com>

`*` Note that the table of contents and other links may not work directly on GitHub.

[Table of Contents](#table-of-contents)
1. [Executive Summary](#executive-summary)
   - [Background](#background)
   - [Data Overview](#data-overview)
   - [Research Questions](#research-questions)
   - [Approach](#approach)
   - [Results](#results)
2. [Exploratory Data Analysis](#exploratory-data-analysis)

# Executive Summary

## Background

Netflix, founded in 1997, has transformed from a DVD rental service to a leading global entertainment platform. With a vast library of movies and series, Netflix offers a unique opportunity to explore trends in the entertainment industry. This report focuses on analyzing Netflix's movie offerings, particularly those from the 1990s, to provide insights for a production company specialising in nostalgic content.

## Data Overview

The dataset contains the following columns:

| Column        | Description                       |
|---------------|-----------------------------------|
| show_id       | The ID of the show                |
| type          | Type of show                      |
| title         | Title of the show                 |
| director      | Director of the show              |
| cast          | Cast of the show                  |
| country       | Country of origin                 |
| date_added    | Date added to Netflix             |
| release_year  | Year of Netflix release           |
| rating        | Show rating                       |
| duration      | Duration of the show in minutes   |
| listed_in     | Genre of the show                 |
| description   | Description of the show           |

## Research Questions

1. What was the most frequent movie duration in the 1990s?
2. How many short movies (less than 90 minutes) were action movies released in the 1990s?

## Approach

1. Filter the data for movies released in the 1990s
2. Find the most frequent movie duration
3. Count the number of short action movies from the 1990s

## Results

- The most frequent movie duration in the 1990s was 90 minutes.
- There were 15 short action movies (less than 90 minutes) released in the 1990s.

# Exploratory Data Analysis

In [1]:
# Importing pandas and matplotlib
import pandas as pd
import matplotlib.pyplot as plt

# Read in the Netflix CSV as a DataFrame
netflix_df = pd.read_csv('netflix.csv')
print(netflix_df.shape)
netflix_df.head(5)

(8807, 12)


Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
0,s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,,United States,"September 25, 2021",2020,PG-13,90 min,Documentaries,"As her father nears the end of his life, filmm..."
1,s2,TV Show,Blood & Water,,"Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban...",South Africa,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, TV Dramas, TV Mysteries","After crossing paths at a party, a Cape Town t..."
2,s3,TV Show,Ganglands,Julien Leclercq,"Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi...",,"September 24, 2021",2021,TV-MA,1 Season,"Crime TV Shows, International TV Shows, TV Act...",To protect his family from a powerful drug lor...
3,s4,TV Show,Jailbirds New Orleans,,,,"September 24, 2021",2021,TV-MA,1 Season,"Docuseries, Reality TV","Feuds, flirtations and toilet talk go down amo..."
4,s5,TV Show,Kota Factory,,"Mayur More, Jitendra Kumar, Ranjan Raj, Alam K...",India,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, Romantic TV Shows, TV ...",In a city of coaching centers known to train I...


In [2]:
# Subset the DataFrame for type 'Movie'
netflix_subset = netflix_df[netflix_df['type'] == 'Movie']

# Filter the to keep only movies released in the 1990s
# Start by filtering out movies that were released in or after 1990
subset = netflix_subset[(netflix_subset['release_year'] >= 1990)]

# And then do the same to filter out movies released before 2000
movies_1990s = subset[(subset['release_year'] < 2000)]
print(movies_1990s.shape)
movies_1990s.head(5)

# Another way to do this step is to use the & operator which allows you to do this type of filtering in one step
# movies_1990s = netflix_subset[(netflix_subset['release_year'] >= 1990) & (netflix_subset['release_year'] < 2000)]

(241, 12)


Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
7,s8,Movie,Sankofa,Haile Gerima,"Kofi Ghanaba, Oyafunmike Ogunlano, Alexandra D...","United States, Ghana, Burkina Faso, United Kin...","September 24, 2021",1993,TV-MA,125 min,"Dramas, Independent Movies, International Movies","On a photo shoot in Ghana, an American model s..."
22,s23,Movie,Avvai Shanmughi,K.S. Ravikumar,"Kamal Hassan, Meena, Gemini Ganesan, Heera Raj...",,"September 21, 2021",1996,TV-PG,161 min,"Comedies, International Movies",Newly divorced and denied visitation rights wi...
24,s25,Movie,Jeans,S. Shankar,"Prashanth, Aishwarya Rai Bachchan, Sri Lakshmi...",India,"September 21, 2021",1998,TV-14,166 min,"Comedies, International Movies, Romantic Movies",When the father of the man she loves insists t...
26,s27,Movie,Minsara Kanavu,Rajiv Menon,"Arvind Swamy, Kajol, Prabhu Deva, Nassar, S.P....",,"September 21, 2021",1997,TV-PG,147 min,"Comedies, International Movies, Music & Musicals",A tangled love triangle ensues when a man fall...
114,s115,Movie,Anjaam,Rahul Rawail,"Madhuri Dixit, Shah Rukh Khan, Tinnu Anand, Jo...",India,"September 2, 2021",1994,TV-14,143 min,"Dramas, International Movies, Thrillers",A wealthy industrialist’s dangerous obsession ...


In [3]:
# Filter the data to keep only movies released in the 1990s
movies_1990s = netflix_subset[(netflix_subset['release_year'] >= 1990) & (netflix_subset['release_year'] < 2000)].copy()

# Find the most frequent movie duration
most_frequent_duration = movies_1990s['duration'].mode()[0]
print(f"The most frequent movie duration in the 1990s was {most_frequent_duration}.")

# Extract the numeric duration and convert it to an integer using a raw string to avoid a warning
movies_1990s['duration_minutes'] = movies_1990s['duration'].str.extract(r'(\d+)').astype(int)

# Filter for short action movies from the 1990s
short_action_movies = movies_1990s[
    (movies_1990s['listed_in'].str.contains('Action', case=False)) & 
    (movies_1990s['duration_minutes'] < 90)
]

# Count the number of short action movies
num_short_action_movies = len(short_action_movies)

# Print the result
print(f"There were {num_short_action_movies} short action movies (less than 90 minutes) released in the 1990s.")


The most frequent movie duration in the 1990s was 94 min.
There were 10 short action movies (less than 90 minutes) released in the 1990s.
