## Netflix Data Analysis 📺

This project analyzes Netflix's catalog using data visualizations to understand trends in content types, countries, genres, and more.

### Step 1: Import Libraries

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Style
sns.set(style='darkgrid')
%matplotlib inline

### Step 2: Load Dataset

In [None]:
url = "https://gist.githubusercontent.com/MounikaKatipally/7c1eaa77b87ef2a23518a57c593680ff/raw/netflix_titles.csv"
df = pd.read_csv(url)
print("Shape of data:", df.shape)
print("\nColumn Names:", df.columns.tolist())
df.head()

### Step 3: Data Cleaning
- Remove duplicates
- Convert date format
- Handle missing values

In [None]:
df.drop_duplicates(inplace=True)
df['date_added'] = pd.to_datetime(df['date_added'], errors='coerce')
df['year_added'] = df['date_added'].dt.year
df['month_added'] = df['date_added'].dt.month
df['genres'] = df['listed_in'].fillna('').apply(lambda x: x.split(', '))
df['country'] = df['country'].fillna('Unknown')
df['rating'] = df['rating'].fillna('Unknown')
df['director'] = df['director'].fillna('Unknown')

### Step 4: Content Type Distribution

In [None]:
sns.countplot(data=df, x='type', palette='Set2')
plt.title('Netflix Content Type Distribution')
plt.tight_layout()
plt.show()

### Step 5: Titles Added Per Year

In [None]:
df['year_added'].value_counts().sort_index().plot(kind='bar', figsize=(12, 5), color='tomato')
plt.title('Number of Titles Added Each Year')
plt.xlabel('Year')
plt.ylabel('Count')
plt.tight_layout()
plt.show()

### Step 6: Top 10 Countries by Content

In [None]:
df['country'].value_counts().head(10).sort_values().plot(kind='barh', color='seagreen')
plt.title('Top 10 Countries by Content on Netflix')
plt.tight_layout()
plt.show()

### Step 7: Top 10 Genres

In [None]:
from collections import Counter
genre_list = sum(df['genres'].tolist(), [])
genre_freq = Counter(genre_list)
pd.Series(genre_freq).sort_values(ascending=False).head(10).plot(kind='bar', color='orchid')
plt.title('Top 10 Genres on Netflix')
plt.tight_layout()
plt.show()

### Step 8: Ratings Distribution

In [None]:
df['rating'].value_counts().plot(kind='bar', figsize=(10,5), color='skyblue')
plt.title('Distribution of Ratings')
plt.xlabel('Rating')
plt.ylabel('Count')
plt.tight_layout()
plt.show()

### Step 9: Top 10 Directors

In [None]:
df['director'].value_counts().dropna().head(10).plot(kind='bar', color='goldenrod')
plt.title('Top 10 Most Frequent Directors')
plt.tight_layout()
plt.show()

## Conclusion 📝
- Netflix has more movies than TV shows.
- Most content was added between 2017-2020.
- The US is the leading contributor.
- Documentaries and Dramas are the most frequent genres.
- TV-MA and TV-14 are common ratings.
- Some directors are repeatedly featured, indicating popularity or collaboration with Netflix.