# Netflix Content Trends Analysis

Created programmatically. This notebook contains EDA, visualizations, and strategic recommendations based on the provided Netflix dataset.

**Sections:**
1. Data load & inspection
2. Data cleaning
3. Exploratory Data Analysis (visualizations)
4. Observations & recommendations


In [None]:
# Imports
import pandas as pd
import matplotlib.pyplot as plt

# Load dataset (update path if needed)
df = pd.read_csv(r"/mnt/data/Netflix Dataset.csv")
df_clean = pd.read_csv(r"/mnt/data/netflix_clean_sample.csv")  # cleaned sample saved alongside
print('Loaded shapes -> original:', df.shape, ', cleaned sample:', df_clean.shape)
df.head(5)

In [None]:
# Basic info and missing values
display(df.head())
print('\nColumns:', df.columns.tolist())
print('\nMissing values (top 20):')
print(df.isnull().sum().sort_values(ascending=False).head(20))

In [None]:
# Prepare year, type, genres columns (robust handling)
df['date_added_parsed'] = pd.to_datetime(df['date_added'], errors='coerce') if 'date_added' in df.columns else pd.NaT
df['added_year'] = df['date_added_parsed'].dt.year
df['release_year'] = pd.to_numeric(df['release_year'], errors='coerce') if 'release_year' in df.columns else pd.NA
df['year'] = df['added_year'].fillna(df['release_year']).astype('Int64')
df['type'] = df['type'].astype(str).str.strip() if 'type' in df.columns else 'Unknown'

# genres column normalization
genre_col = None
for c in ['listed_in', 'genres', 'genre', 'listed in']:
    if c in df.columns:
        genre_col = c
        break
if genre_col is not None:
    df['genres'] = df[genre_col].astype(str)
else:
    df['genres'] = ''

print('Prepared year and genres columns. Sample:')
df[['title','type','year','genres']].head()

In [None]:
# Visualizations (matplotlib - each figure separate)
import matplotlib.pyplot as plt

# 1) Movies vs TV Shows
plt.figure(figsize=(6,4))
df['type'].value_counts().plot(kind='bar')
plt.title('Count: Movies vs TV Shows (overall)')
plt.xlabel('Type')
plt.ylabel('Count')
plt.show()

# 2) Titles per Year (trend)
plt.figure(figsize=(10,4))
year_counts = df['year'].value_counts().sort_index()
year_counts.plot(kind='line', marker='o')
plt.title('Number of Titles by Year')
plt.xlabel('Year')
plt.ylabel('Number of Titles')
plt.show()

# 3) Top Genres
genres_series = df['genres'].dropna().astype(str).str.split(',').explode().str.strip()
top_genres = genres_series.value_counts().head(12)
plt.figure(figsize=(10,5))
top_genres.plot(kind='bar')
plt.title('Top 12 Genres')
plt.xlabel('Genre')
plt.ylabel('Count')
plt.show()

# 4) Top Countries (if available)
country_col = None
for c in ['country','countries','Country']:
    if c in df.columns:
        country_col = c
        break
if country_col is not None:
    countries_series = df[country_col].dropna().astype(str).str.split(',').explode().str.strip()
    top_countries = countries_series.value_counts().head(12)
    plt.figure(figsize=(10,5))
    top_countries.plot(kind='bar')
    plt.title('Top 12 Countries')
    plt.xlabel('Country')
    plt.ylabel('Count')
    plt.show()
else:
    print('No country column found; skip country chart')

## Observations & Strategic Recommendations

- **Content mix (Movies vs TV Shows):** Look at the bar chart to see whether Netflix focuses more on Movies or TV Shows across the dataset timeframe.
- **Yearly trends:** The titles-per-year plot helps identify growth periods and possible shifts (e.g., increased originals after certain years).
- **Genres:** Top genres indicate what kinds of content dominate — use this to recommend where to invest (e.g., more originals in underrepresented but growing genres).
- **Country representation:** Country-wise counts show global reach and gaps — recommend markets/countries with low representation but growing viewer bases.

**Next steps if you want deeper analysis:**
- Sentiment / review analysis (requires ratings/review data)
- Topic modeling on 'description' or 'cast' to cluster similar titles
- Time-series forecasting for future content additions by genre
