Week 4 · Day 6 — Recommender Systems
Why this matters

Recommender systems power Netflix, Amazon, Spotify, and YouTube. They personalize content by predicting what users will like, based on patterns of past behavior.

Theory Essentials

Types of recommenders:

Content-based: recommend items similar to those a user liked.

Collaborative filtering: recommend items liked by similar users.

Collaborative filtering:

User–item interaction matrix (rows = users, cols = items).

Missing values = unrated items.

Predictions come from user similarity or item similarity.

Matrix factorization: approximates user–item matrix into low-dimensional latent features (e.g., “taste vectors”).

Simple intro in scikit-learn: use cosine similarity on users or items.

In [1]:
# Setup
import numpy as np, pandas as pd, matplotlib.pyplot as plt
from sklearn.metrics.pairwise import cosine_similarity

np.random.seed(42)
plt.rcParams["figure.figsize"] = (6,4)
plt.rcParams["axes.grid"] = True

# Example user–item rating matrix (rows=users, cols=movies)
# 0 means not rated
ratings = pd.DataFrame([
    [5, 4, 0, 0, 1],
    [4, 0, 0, 2, 2],
    [0, 1, 5, 4, 0],
    [0, 0, 4, 5, 0],
    [1, 1, 0, 0, 5]
], columns=["MovieA","MovieB","MovieC","MovieD","MovieE"],
   index=["User1","User2","User3","User4","User5"])

print("User–item ratings:")
print(ratings)

# Compute item similarity (cosine)
item_sim = cosine_similarity(ratings.T)
item_sim_df = pd.DataFrame(item_sim, index=ratings.columns, columns=ratings.columns)

print("\nItem similarity matrix:")
print(item_sim_df.round(2))

# Simple recommendation: predict User1’s rating for MovieC
user = "User1"
target_item = "MovieC"

# Weighted sum of User1's ratings by item similarity
user_ratings = ratings.loc[user]
sim_scores = item_sim_df[target_item]
pred_score = (user_ratings * sim_scores).sum() / sim_scores[user_ratings>0].sum()

print(f"\nPredicted rating of {user} for {target_item}: {pred_score:.2f}")


User–item ratings:
       MovieA  MovieB  MovieC  MovieD  MovieE
User1       5       4       0       0       1
User2       4       0       0       2       2
User3       0       1       5       4       0
User4       0       0       4       5       0
User5       1       1       0       0       5

Item similarity matrix:
        MovieA  MovieB  MovieC  MovieD  MovieE
MovieA    1.00    0.76    0.00    0.18    0.51
MovieB    0.76    1.00    0.18    0.14    0.39
MovieC    0.00    0.18    1.00    0.93    0.00
MovieD    0.18    0.14    0.93    1.00    0.11
MovieE    0.51    0.39    0.00    0.11    1.00

Predicted rating of User1 for MovieC: 4.00


How this works:

Take all movies User1 already rated (user_ratings).

Look up how similar each of those movies is to MovieC (sim_scores).

Compute a weighted average:

User’s rating × similarity to target item.

Normalize by the sum of similarities (so the score is on the same scale).

👉 If User1 liked movies similar to MovieC, the predicted score will be high.

Exercises

1) Core (10–15 min)
Task: Compute predicted ratings for User2 on all unrated movies.

In [2]:
user = "User2"
user_ratings = ratings.loc[user]
for item in ratings.columns:
    if user_ratings[item] == 0:
        sim_scores = item_sim_df[item]
        pred = (user_ratings * sim_scores).sum() / sim_scores[user_ratings>0].sum()
        print(f"Predicted rating of {user} for {item}: {pred:.2f}")


Predicted rating of User2 for MovieB: 3.18
Predicted rating of User2 for MovieC: 2.00


In [3]:
user_sim = cosine_similarity(ratings)
user_sim_df = pd.DataFrame(user_sim, index=ratings.index, columns=ratings.index)
print("User similarity matrix:")
print(user_sim_df.round(2))


User similarity matrix:
       User1  User2  User3  User4  User5
User1   1.00   0.69   0.10   0.00   0.42
User2   0.69   1.00   0.25   0.32   0.55
User3   0.10   0.25   1.00   0.96   0.03
User4   0.00   0.32   0.96   1.00   0.00
User5   0.42   0.55   0.03   0.00   1.00


3) Stretch (optional, 10–15 min)
Task: Recommend top-2 movies for User3 using item-based collaborative filtering

In [5]:
user = "User3"
user_ratings = ratings.loc[user]
preds = {}
for item in ratings.columns:
    if user_ratings[item] == 0:
        sim_scores = item_sim_df[item]
        preds[item] = (user_ratings * sim_scores).sum() / sim_scores[user_ratings>0].sum()

print("Recommendations for User3:")
for movie, score in sorted(preds.items(), key=lambda x: x[1], reverse=True)[:2]:
    print(f"{movie}: {float(score):.2f}")

Recommendations for User3:
MovieE: 1.66
MovieA: 1.58


Mini-Challenge (≤40 min)

Task: Build a simple recommender for the MovieLens small dataset (100k).
Acceptance Criteria:

Load dataset (surprise library or CSV if available).

Compute item similarity.

Pick a user, predict ratings for 2–3 unseen movies.

Print top recommendations.

In [6]:
import numpy as np, pandas as pd
from sklearn.metrics.pairwise import cosine_similarity

# 1) Load ratings (user_id, item_id, rating, timestamp)
ratings = pd.read_csv("u.data", sep="\t", names=["user","item","rating","ts"])
# Optional: titles
titles = pd.read_csv("u.item", sep="|", header=None, encoding="latin-1", usecols=[0,1], names=["item","title"])

# 2) Build user–item matrix
R = ratings.pivot_table(index="user", columns="item", values="rating").fillna(0)

# 3) Item–item cosine similarity
item_sim = pd.DataFrame(cosine_similarity(R.T), index=R.columns, columns=R.columns)

# 4) Recommend for a user (e.g., 196)
u = 196
user_r = R.loc[u]
preds = {}
for item in R.columns[user_r.eq(0)]:               # only unseen items
    sims = item_sim[item]
    preds[item] = (user_r * sims).sum() / sims[user_r>0].sum()

# 5) Top recommendations
top = sorted(preds.items(), key=lambda x: x[1], reverse=True)[:3]
print("Top recommendations for user", u)
for iid, score in top:
    title = titles.loc[titles.item==iid, "title"].values[0] if iid in set(titles.item) else iid
    print(f"{title}: {float(score):.2f}")


Top recommendations for user 196
Very Natural Thing, A (1974): 4.43
Walk in the Sun, A (1945): 4.43
New York Cop (1996): 4.22


Notes / Key Takeaways

Recommenders personalize experiences and drive engagement.

Collaborative filtering leverages patterns across users/items.

Cosine similarity is a simple baseline; real systems use matrix factorization, deep learning, or hybrid approaches.

Cold-start problem: new users/items with no history are hard to recommend for.

Evaluation: precision@k, recall@k, RMSE on held-out ratings.

Reflection

Why do recommenders need both accuracy and diversity in their results?

Accuracy ensures the system suggests items the user is most likely to enjoy, based on past behavior.

Diversity prevents the list from being too narrow or repetitive (e.g., all action movies only).

Together they balance relevance with exploration, keeping recommendations useful and engaging over time.

How would you handle the cold-start problem for a brand-new user?

Ask for quick initial input (e.g., ratings on a few popular items or picking favorite genres).

Use popularity-based or content-based recommendations until enough user data is collected.

Gradually shift to collaborative filtering once the system learns the user’s preferences.