# 📺 Amazon Prime Video Analysis

This project is about understanding what kind of shows and movies are available on Amazon Prime Video. We'll look at things like genres, ratings, and how many movies or shows were released each year.

### What are we trying to find out?
- Which genres are most common?
- Are there more movies or TV shows?
- What are the average ratings like?
- How has the content changed over time?
- What kind of data is available in the files?

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import ast
sns.set(style="whitegrid")

In [None]:
# Load the data files
titles = pd.read_csv('/mnt/data/titles.csv')
credits = pd.read_csv('/mnt/data/credits.csv')

In [None]:
# Show first few rows of the data
print(titles.head())
print(credits.head())

In [None]:
# Check the structure of the data
titles.info()
credits.info()

In [None]:
# See basic statistics of numerical columns
print(titles.describe())

In [None]:
# Check how many missing values are in each column
print(titles.isnull().sum())

In [None]:
# Fill missing values so we can work better with the data
titles['age_certification'].fillna('Unknown', inplace=True)
titles['imdb_score'].fillna(titles['imdb_score'].median(), inplace=True)
titles['imdb_votes'].fillna(0, inplace=True)
titles['tmdb_score'].fillna(titles['tmdb_score'].median(), inplace=True)
titles['tmdb_popularity'].fillna(titles['tmdb_popularity'].median(), inplace=True)

In [None]:
# Convert genres and countries from string to list
titles['genres'] = titles['genres'].apply(ast.literal_eval)
titles['production_countries'] = titles['production_countries'].apply(ast.literal_eval)

In [None]:
# Let's see the most common genres
from collections import Counter
genre_counts = Counter([genre for sublist in titles['genres'] for genre in sublist])
genre_df = pd.DataFrame(genre_counts.items(), columns=['Genre', 'Count']).sort_values(by='Count', ascending=False)

plt.figure(figsize=(10,6))
sns.barplot(data=genre_df.head(10), x='Count', y='Genre')
plt.title("Top 10 Genres on Amazon Prime")
plt.show()

In [None]:
# Are there more movies or TV shows?
type_counts = titles['type'].value_counts()
plt.figure(figsize=(6,6))
plt.pie(type_counts, labels=type_counts.index, autopct='%1.1f%%')
plt.title("Movies vs TV Shows")
plt.show()

In [None]:
# Let's see how many titles came out over the years
plt.figure(figsize=(12,6))
sns.countplot(data=titles, x='release_year', order=titles['release_year'].value_counts().index[:20])
plt.title("Most Active Release Years")
plt.xticks(rotation=45)
plt.show()

In [None]:
# Compare IMDb scores of movies and shows
plt.figure(figsize=(8,5))
sns.boxplot(data=titles, x='type', y='imdb_score')
plt.title("IMDb Scores: Movies vs Shows")
plt.show()

In [None]:
# Let's look at how ratings and popularity relate
cols = ['imdb_score', 'imdb_votes', 'tmdb_popularity', 'tmdb_score', 'runtime']
plt.figure(figsize=(8,6))
sns.heatmap(titles[cols].corr(), annot=True, cmap='Blues')
plt.title("Relationship Between Scores and Popularity")
plt.show()

### Final Thoughts
- Most content is movies.
- Drama and Comedy are the top genres.
- Many titles were added in recent years.
- IMDb and TMDB scores show similar trends.

This project helps us understand what type of content is popular on Amazon Prime.