
# StreamFlix ‚Äì Audience Segmentation using K-Means Clustering üçøüìä

This notebook demonstrates **unsupervised learning (K-Means Clustering)** through a real-world style story from **StreamFlix**.

**Objective:**  
Discover hidden viewer segments based purely on behavior (no labels).

Workflow:
1. Installation  
2. Dataset Preparation  
3. Exploratory Analysis  
4. Unsupervised Learning Setup  
5. K-Means Clustering  
6. Cluster Interpretation  
7. Business Actions


## 1. Installation

In [1]:

# Uncomment if needed
# !pip install numpy pandas matplotlib scikit-learn


## 2. Imports

In [2]:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score

np.random.seed(42)
%matplotlib inline


## 3. Dataset Preparation

In [3]:

n = 20000
segments = ["binger", "comedy", "horror", "family"]
seg = np.random.choice(segments, size=n, p=[0.3,0.25,0.2,0.25])

rows = []
for s in seg:
    if s=="binger":
        rows.append([
            np.random.normal(180,30),
            np.random.uniform(0.7,1.0),
            np.random.uniform(0.1,0.3),
            np.random.uniform(0.2,0.4),
            np.random.dirichlet([4,1,1,1])
        ])
    elif s=="comedy":
        rows.append([
            np.random.normal(60,10),
            np.random.uniform(0.2,0.4),
            np.random.uniform(0.05,0.2),
            np.random.uniform(0.6,0.9),
            np.random.dirichlet([1,4,1,1])
        ])
    elif s=="horror":
        rows.append([
            np.random.normal(100,20),
            np.random.uniform(0.3,0.6),
            np.random.uniform(0.7,0.9),
            np.random.uniform(0.3,0.6),
            np.random.dirichlet([1,1,4,1])
        ])
    else:
        rows.append([
            np.random.normal(120,20),
            np.random.uniform(0.6,0.9),
            np.random.uniform(0.1,0.3),
            np.random.uniform(0.2,0.4),
            np.random.dirichlet([1,1,1,4])
        ])

data = []
for r in rows:
    g = r[4]
    data.append([
        r[0], r[1], r[2], r[3],
        g[0], g[1], g[2], g[3]
    ])

df = pd.DataFrame(data, columns=[
    "avg_watch_minutes","weekend_ratio","late_night_ratio","mobile_ratio",
    "drama_pct","comedy_pct","horror_pct","family_pct"
])

df.head()


Unnamed: 0,avg_watch_minutes,weekend_ratio,late_night_ratio,mobile_ratio,drama_pct,comedy_pct,horror_pct,family_pct
0,51.961968,0.269328,0.149492,0.744627,0.176316,0.644908,0.016286,0.16249
1,110.873932,0.609859,0.127181,0.263955,0.074536,0.409331,0.241147,0.274987
2,71.266964,0.398297,0.763967,0.397948,0.162876,0.110277,0.70332,0.023527
3,104.057868,0.333749,0.844846,0.415698,0.014717,0.323509,0.609849,0.051924
4,199.26363,0.912813,0.157854,0.212405,0.644766,0.101303,0.058612,0.195319


## 4. Feature Scaling

In [4]:

scaler = StandardScaler()
X_scaled = scaler.fit_transform(df)


## 5. K-Means Clustering

In [5]:

kmeans = KMeans(n_clusters=4, random_state=42, n_init=10)
clusters = kmeans.fit_predict(X_scaled)

df["cluster"] = clusters
df["cluster"].value_counts()


Unnamed: 0_level_0,count
cluster,Unnamed: 1_level_1
0,5957
2,5080
1,5011
3,3952


## 6. Cluster Evaluation

In [6]:

sil = silhouette_score(X_scaled, clusters)
print("Silhouette Score:", round(sil,3))

centers = pd.DataFrame(
    scaler.inverse_transform(kmeans.cluster_centers_),
    columns=df.columns[:-1]
)
centers


Silhouette Score: 0.498


Unnamed: 0,avg_watch_minutes,weekend_ratio,late_night_ratio,mobile_ratio,drama_pct,comedy_pct,horror_pct,family_pct
0,178.642677,0.848032,0.199461,0.299321,0.580236,0.143581,0.143535,0.132649
1,121.673277,0.750415,0.201288,0.299373,0.139454,0.141629,0.140028,0.578888
2,60.300774,0.300724,0.126296,0.748792,0.142351,0.571287,0.143264,0.143098
3,100.212937,0.450232,0.797465,0.449943,0.142343,0.142977,0.571473,0.143208



## 7. Business Interpretation

Each cluster corresponds to a behavioral persona:

- High watch time + weekends ‚Üí **Weekend Bingers**
- Short sessions + mobile + comedy ‚Üí **Lunchtime Comedy Fans**
- Late-night + horror ‚Üí **Horror Buffs**
- Weekend + family content ‚Üí **Family Movie Nights**

Maya can now personalize campaigns, recommendations, and notifications.
