## **IE7275 — Project‑Based Assessment 1: Build a Recommendation System**

---

**Time limit:** 90 minutes  
**Dataset:** `movielens_1m.csv`   
**Format:** Individual work in this notebook (submit `.ipynb`)

**Goal:** Build and evaluate a recommendation system using any technique(s) you prefer (collaborative filtering, content-based, hybrid, matrix factorization, neural methods, etc.).

**What to submit:** A runnable notebook with your code, metrics, and short analysis.

---

## Dataset overview
- **File**: `movielens_1m.csv` (~91.1 MB)
- **Columns**: UserID, MovieID, Rating, Timestamp, Gender, Age, Occupation, Zip-code, Title, Genres, Datetime
- **User column**: `UserID`
- **Item column**: `MovieID`
- **Rating column**: `Rating` (min=1.0, max=5.0, mean≈3.56)
- **Timestamp column**: `Timestamp` (range: 2000-11-21 19:59:26 → 2003-02-27 23:31:15)
- **Title column**: `Title`
- **Genres column**: `Genres` (pipe‑separated or delimited string)
- **Approx. unique users/items in sample**: 1,228 users / 3,463 items (from a sample of the file)

**Notes:**
- Ratings are explicit (e.g., 1–5 stars) if a rating column is provided; otherwise you may treat interactions as implicit feedback.
- If a timestamp is present, prefer a **time‑aware split** (train on earlier interactions, test on later).
- For content features, you may parse the **genres** and/or use **titles** (e.g., TF‑IDF of titles) for hybrid models.

---

## Rules & Deliverables

**You may use any libraries** (e.g., `pandas`, `numpy`, `scikit-learn`, `surprise`, `implicit`, `lightfm`, `tensorflow/pytorch`). If installing extra packages, include the install cell (and keep it lightweight).

### Required deliverables
1. **Preprocessing summary** (brief): how you cleaned/filtered the data.
2. **Two recommendation approaches** (any mix, e.g., Popularity baseline + Item‑CF; MF + Content‑based; Hybrid; etc.).
3. **Evaluation** with at least **two ranking metrics** (e.g., Precision@k, Recall@k, Hit Rate, MAP@k, NDCG@k). Use a **time‑aware split** if timestamps exist, otherwise use a user‑stratified split.
4. **Results & brief discussion**: a short comparison of the two approaches, trade‑offs, and observations (sparsity, cold‑start, bias, etc.).
5. **Top‑N recommendations** for **5 sample users** with brief justification.

### Suggested timeboxing
- Data understanding & prep: ~15 min  
- Modeling (2 approaches): ~40 min  
- Evaluation & analysis: ~25 min  
- Wrap‑up: ~10 min

---

In [None]:
# OPTIONAL: install extra libraries if needed (uncomment)
# %pip install scikit-surprise implicit lightfm


In [3]:
# 1) Imports & configuration
import os, math, random
import numpy as np
import pandas as pd

RNG_SEED = 42
random.seed(RNG_SEED)
np.random.seed(RNG_SEED)

DATA_PATH = "movielens_1m.csv"
assert os.path.exists(DATA_PATH), f"Dataset not found at {DATA_PATH}. Please upload or fix the path."


In [4]:
# 2) Load data
df = pd.read_csv(DATA_PATH)
print(df.shape)
df.head()

(1000209, 11)


Unnamed: 0,UserID,MovieID,Rating,Timestamp,Gender,Age,Occupation,Zip-code,Title,Genres,Datetime
0,1,1193,5,978300760,F,1,10,48067,One Flew Over the Cuckoo's Nest (1975),Drama,2000-12-31 22:12:40
1,1,661,3,978302109,F,1,10,48067,James and the Giant Peach (1996),Animation|Children's|Musical,2000-12-31 22:35:09
2,1,914,3,978301968,F,1,10,48067,My Fair Lady (1964),Musical|Romance,2000-12-31 22:32:48
3,1,3408,4,978300275,F,1,10,48067,Erin Brockovich (2000),Drama,2000-12-31 22:04:35
4,1,2355,5,978824291,F,1,10,48067,"Bug's Life, A (1998)",Animation|Children's|Comedy,2001-01-06 23:38:11


In [5]:
# 3) Quick sanity checks
print("Columns:", list(df.columns))
print("Null counts:\n", df.isna().sum())

# Peek at basic stats for ratings if present
rating_col_candidates = [c for c in df.columns if c.lower() in ["rating","rate","score","stars"]]
if rating_col_candidates:
    rc = rating_col_candidates[0]
    print(f"Rating column detected: {rc}")
    print(df[rc].describe())

# If timestamp column exists, convert a copy to datetime for inspection
ts_candidates = [c for c in df.columns if c.lower() in ["timestamp","time","datetime","date"]]
if ts_candidates:
    tc = ts_candidates[0]
    try:
        # Try epoch seconds first
        dt = pd.to_datetime(df[tc], unit="s", errors="coerce")
        if dt.notna().mean() < 0.5:
            # Fallback: parse direct
            dt = pd.to_datetime(df[tc], errors="coerce")
        print("Time coverage:", dt.min(), "->", dt.max())
    except Exception as e:
        print("Timestamp parse note:", e)


Columns: ['UserID', 'MovieID', 'Rating', 'Timestamp', 'Gender', 'Age', 'Occupation', 'Zip-code', 'Title', 'Genres', 'Datetime']
Null counts:
 UserID        0
MovieID       0
Rating        0
Timestamp     0
Gender        0
Age           0
Occupation    0
Zip-code      0
Title         0
Genres        0
Datetime      0
dtype: int64
Rating column detected: Rating
count    1.000209e+06
mean     3.581564e+00
std      1.117102e+00
min      1.000000e+00
25%      3.000000e+00
50%      4.000000e+00
75%      4.000000e+00
max      5.000000e+00
Name: Rating, dtype: float64
Time coverage: 2000-04-25 23:05:32 -> 2003-02-28 17:49:50


In [25]:
# 4) Preprocessing — TODOs
# - Define user_col, item_col, and (optionally) rating_col, ts_col
# - Handle duplicates/missing
# - Optional: filter very rare users/items (e.g., min 5 interactions)
# - Optional: parse genres (split by '|' or other delimiter), create content features
# - Decide: explicit (ratings) vs implicit (binary interactions)

# >>> START HERE <<<
# Example: infer standard column names (edit as needed)
user_col = next((c for c in df.columns if c.lower() in ["userid","user_id","user"]), None)
item_col = next((c for c in df.columns if c.lower() in ["movieid","movie_id","itemid","item_id","movie","item"]), None)
rating_col = next((c for c in df.columns if c.lower() in ["rating","rate","score","stars"]), None)
ts_col = next((c for c in df.columns if c.lower() in ["timestamp","time","datetime","date"]), None)

user_col = 'UserID'
item_col = 'MovieID'
rating_col = 'Rating'
ts_col = 'Timestamp'

print("Using columns ->", user_col, item_col, rating_col, ts_col)

user_col = 'UserID'
item_col = 'MovieID'
rating_col = 'Rating'
ts_col = 'Timestamp'

print(f"\n---Preprocessing Summary ---")

# **TODO: Preprocessing implementation**

df = df.dropna(subset=[user_col, item_col, rating_col])

df = df.drop_duplicates(subset=[user_col, item_col, ts_col])

min_user_inter = 5
min_item_inter = 5
vc_users = df[user_col].value_counts()
vc_items = df[item_col].value_counts()
df = df[df[user_col].isin(vc_users[vc_users >= min_user_inter].index)]
df = df[df[item_col].isin(vc_items[vc_items >= min_item_inter].index)]

print(f"Post-prep shape: {df.shape}")
print(f"Unique Users: {df[user_col].nunique()}, Unique Items: {df[item_col].nunique()}")



Using columns -> UserID MovieID Rating Timestamp

---Preprocessing Summary ---
Post-prep shape: (999611, 11)
Unique Users: 6040, Unique Items: 3416


In [8]:
# 5) Train/validation/test split helpers
from typing import Tuple, Dict, List

def time_aware_split(df, user_col, ts_col, train_frac=0.8):
    """Per user, sort by timestamp and split earliest -> train, latest -> test/val.
    Returns a dict with train and test DataFrames.
    """
    assert ts_col in df.columns, "Timestamp column required for time‑aware split."
    parts = []
    for u, grp in df.sort_values(ts_col).groupby(user_col):
        n = len(grp)
        cut = max(1, int(n * train_frac))
        parts.append((grp.iloc[:cut], grp.iloc[cut:]))
    train = pd.concat([p[0] for p in parts], ignore_index=True)
    test  = pd.concat([p[1] for p in parts], ignore_index=True)
    return { "train": train, "test": test }

def stratified_user_holdout(df, user_col, holdout=1):
    """If no timestamps available: keep last `holdout` interactions per user for test (by index order).
    """
    parts = []
    for u, grp in df.groupby(user_col):
        if len(grp) <= holdout:
            tr = grp.iloc[:-1] if len(grp) > 1 else grp.iloc[:0]
            te = grp.iloc[-1:]
        else:
            tr = grp.iloc[:-holdout]
            te = grp.iloc[-holdout:]
        parts.append((tr, te))
    train = pd.concat([p[0] for p in parts], ignore_index=True)
    test  = pd.concat([p[1] for p in parts], ignore_index=True)
    return { "train": train, "test": test }

# Choose split
if ts_col:
    splits = time_aware_split(df, user_col, ts_col, train_frac=0.8)
else:
    splits = stratified_user_holdout(df, user_col, holdout=1)

train_df, test_df = splits["train"], splits["test"]
train_df.shape, test_df.shape


((797275, 11), (202336, 11))

In [10]:
# 6) Ranking metrics (provided)
from collections import defaultdict

def precision_at_k(recommended: list, ground_truth: set, k=10):
    rec_k = recommended[:k]
    if not rec_k:
        return 0.0
    hit = sum(1 for x in rec_k if x in ground_truth)
    return hit / len(rec_k)

def recall_at_k(recommended: list, ground_truth: set, k=10):
    if not ground_truth:
        return 0.0
    rec_k = recommended[:k]
    hit = sum(1 for x in rec_k if x in ground_truth)
    return hit / len(ground_truth)

def apk(recommended: list, ground_truth: set, k=10):
    if not ground_truth:
        return 0.0
    score = 0.0
    hits = 0
    for i, p in enumerate(recommended[:k], start=1):
        if p in ground_truth:
            hits += 1
            score += hits / i
    return score / min(len(ground_truth), k) if ground_truth else 0.0

def ndcg_at_k(recommended: list, ground_truth: set, k=10):
    def dcg(rel):
        return sum((1.0/np.log2(i+2)) for i, r in enumerate(rel) if r)
    rel = [1 if x in ground_truth else 0 for x in recommended[:k]]
    idcg = dcg(sorted(rel, reverse=True))
    return dcg(rel) / idcg if idcg > 0 else 0.0


In [11]:
# 7) Baseline recommender: global popularity (works for implicit or explicit)
# For explicit ratings, popularity ~ average rating; for implicit, popularity ~ #interactions

if rating_col:
    item_pop = train_df.groupby(item_col)[rating_col].mean().sort_values(ascending=False)
else:
    item_pop = train_df.groupby(item_col)[user_col].count().sort_values(ascending=False)

popular_items = list(item_pop.index)

def recommend_popularity(user_id, k=10, seen_items=None):
    seen = set() if seen_items is None else set(seen_items)
    recs = [it for it in popular_items if it not in seen]
    return recs[:k]

# Example usage for one user:
u0 = train_df[user_col].iloc[0]
seen_u0 = train_df.loc[train_df[user_col]==u0, item_col].unique()
recommend_popularity(u0, k=10, seen_items=seen_u0)


[53, 167, 3905, 3245, 2503, 1164, 2930, 2444, 2905, 2019]

In [23]:
# 8) Your model(s) — TODO
# Build at least ONE personalized recommender in addition to the popularity baseline.
# Ideas (choose any):
# - Item‑based or user‑based collaborative filtering with cosine similarity
# - Matrix factorization (e.g., SVD) with explicit ratings
# - Implicit MF / LightFM (for implicit feedback)
# - Content‑based (TF‑IDF on titles; one‑hot/embedding for genres), or Hybrid

# >>> Implement your chosen model(s) below. <<<


# Model A: Popularity Baseline ---
# Using average rating for popularity
item_pop = train_df.groupby(item_col)[rating_col].mean().sort_values(ascending=False)
popular_items = list(item_pop.index)

def recommend_popularity(user_id, k=10, seen_items=None):
    seen = set() if seen_items is None else set(seen_items)
    recs = [it for it in popular_items if it not in seen]
    return recs[:k]

# Model B: Item-Item Collaborative Filtering (Item-CF) ---
print("\n--- 7) Training Item-CF Model ---")

R = train_df.pivot_table(
    index=user_col, 
    columns=item_col, 
    values=rating_col, 
    aggfunc='mean'
).fillna(0)
print(f"User-Item Matrix shape: {R.shape}")


item_sim = cosine_similarity(R.T)

item_list = R.columns.tolist()
item_index_map = {item: idx for idx, item in enumerate(item_list)}

def recommend_itemcf(user_id, k=10):
    # Handle cold-start user
    if user_id not in R.index:
        return recommend_popularity(user_id, k)

    user_vector = R.loc[user_id].values
    scores = item_sim.dot(user_vector)
    seen_mask = (user_vector > 0)
    seen_items = set(R.columns[seen_mask])
    ranked_indices = np.argsort(-scores)
    ranked_items = [item_list[idx] for idx in ranked_indices]
    recs = [it for it in ranked_items if it not in seen_items]
    return list(recs)[:k]

# Model C: Content-Based Filtering (Genres) ---
print("\n--- 7.5) Training Content-Based (Genre) Model ---")
content_df = df[[item_col, 'Genres']].drop_duplicates().set_index(item_col)

genre_matrix = content_df['Genres'].str.get_dummies(sep='|')

train_items = R.columns
genre_matrix = genre_matrix.loc[genre_matrix.index.intersection(train_items)] 
print(f"Item-Genre Matrix shape: {genre_matrix.shape}")

content_item_list = genre_matrix.index.tolist()
content_item_index_map = {item: idx for idx, item in enumerate(content_item_list)}


def recommend_content_based(user_id, k=10):
    if user_id not in R.index:
        return recommend_popularity(user_id, k)

    user_ratings = R.loc[user_id]
    user_seen_items = user_ratings[user_ratings > 0].index
    
    user_profile = np.zeros(genre_matrix.shape[1])
    total_rating_sum = 0
    
    for item_id in user_seen_items:
        if item_id in content_item_list:
            rating = user_ratings.loc[item_id]
            user_profile += genre_matrix.loc[item_id].values * rating
            total_rating_sum += rating

    if total_rating_sum > 0:
        user_profile /= total_rating_sum
    else:
        return recommend_popularity(user_id, k)

    candidate_items = genre_matrix.index
    scores = genre_matrix.dot(user_profile)
    
    ranked_indices = np.argsort(-scores.values)
    ranked_items = [candidate_items[idx] for idx in ranked_indices]

    seen = set(user_seen_items)
    recs = [it for it in ranked_items if it not in seen]
    return list(recs)[:k]






--- 7) Training Item-CF Model ---
User-Item Matrix shape: (6040, 3415)

--- 7.5) Training Content-Based (Genre) Model ---
Item-Genre Matrix shape: (3415, 18)


In [24]:
# 9) Evaluation — TODO
# Evaluate popularity baseline and your model(s) using at least two metrics (Precision@k, Recall@k, MAP@k, NDCG@k).
# Use k=10 (and optionally 20). Consider time‑aware split correctness.

def user_ground_truth(df_user,threshold=3.5):
    # For explicit: treat items in test as relevant if rating >= threshold (choose threshold)
    # For implicit: all test items are relevant
    #return set(df_user[item_col].unique())
    return set(df_user.loc[df_user[rating_col] >= threshold, item_col].unique())

K = 10
users = test_df[user_col].unique()[:1000]  # cap for speed

pop_metrics = {"precision": [], "recall": [], "map": [], "ndcg": []}

for u in users:
    seen = train_df.loc[train_df[user_col]==u, item_col].unique()
    gt = user_ground_truth(test_df.loc[test_df[user_col]==u])
    if not gt: 
        continue
    recs = recommend_popularity(u, k=K, seen_items=seen)
    pop_metrics["precision"].append(precision_at_k(recs, gt, k=K))
    pop_metrics["recall"].append(recall_at_k(recs, gt, k=K))
    pop_metrics["map"].append(apk(recs, gt, k=K))
    pop_metrics["ndcg"].append(ndcg_at_k(recs, gt, k=K))

print("Popularity@10 — precision: %.3f  recall: %.3f  MAP: %.3f  NDCG: %.3f" %
      (np.mean(pop_metrics["precision"]), np.mean(pop_metrics["recall"]),
       np.mean(pop_metrics["map"]), np.mean(pop_metrics["ndcg"])))

# TODO: repeat for your model(s) and compare

print("\n------ Evaluation for K=10 ---")

K = 10

users_to_evaluate = test_df[user_col].unique()[:min(1000, test_df[user_col].nunique())] 

models = {
    "Model_A": recommend_popularity,    
    "Model_B": recommend_itemcf,       
    "Model_C": recommend_content_based  
}
metric_names = ["precision@10", "recall@10", "map@10", "ndcg@10"] 
results = {}

num_users_evaluated = len(users_to_evaluate)

for model_name, recommender_func in models.items():
    metrics = defaultdict(list)
    
    for u in users_to_evaluate:
        seen = train_df.loc[train_df[user_col] == u, item_col].unique()
        
        gt = user_ground_truth(test_df.loc[test_df[user_col] == u], threshold=3.5)
        
        if not gt: 
            continue
            
        recs = recommender_func(u, k=K)

        metrics["precision@10"].append(precision_at_k(recs, gt, k=K))
        metrics["recall@10"].append(recall_at_k(recs, gt, k=K))
        metrics["map@10"].append(apk(recs, gt, k=K))
        metrics["ndcg@10"].append(ndcg_at_k(recs, gt, k=K))

    mean_metrics = {name: np.mean(values) for name, values in metrics.items()}
    results[model_name] = mean_metrics

print("\n=== Model Performance Summary ===")

for model_name, metrics in results.items():
    print(f"\n{model_name}")
    for metric_name in metric_names:
        print(f"{metric_name}: {metrics.get(metric_name, 0.0):.4f}")
    print(f"n_users_eval: {num_users_evaluated}")

print("-" * 50)


Popularity@10 — precision: 0.002  recall: 0.001  MAP: 0.000  NDCG: 0.007

--- 8) Evaluation for K=10 ---

=== Model Performance Summary ===

Model_A
precision@10: 0.0021
recall@10: 0.0010
map@10: 0.0003
ndcg@10: 0.0062
n_users_eval: 1000

Model_B
precision@10: 0.1105
recall@10: 0.0889
map@10: 0.0673
ndcg@10: 0.3145
n_users_eval: 1000

Model_C
precision@10: 0.0185
recall@10: 0.0168
map@10: 0.0080
ndcg@10: 0.0737
n_users_eval: 1000
--------------------------------------------------


In [21]:
# 10) Report top‑N for 5 users — TODO
sample_users = list(users)[:5]
for u in sample_users:
    seen = train_df.loc[train_df[user_col]==u, item_col].unique()
    pop_recs = recommend_popularity(u, k=10, seen_items=seen)
    print(f"User {u} — Popularity@10:", pop_recs)
    # TODO: also show your model's top‑10 for comparison
    
    print("\n--- 9) Top-10 Recommendations for 5 Sample Users of different three models A,B,C ---")

movie_titles = df[[item_col, 'Title']].drop_duplicates().set_index(item_col)['Title'].to_dict()

sample_users = list(users_to_evaluate)[:5]

for u in sample_users:

    seen_ids = train_df.loc[train_df[user_col] == u, item_col].unique()
    
    pop_recs_ids = recommend_popularity(u, k=10)
    itemcf_recs_ids = recommend_itemcf(u, k=10)
    cbf_recs_ids = recommend_content_based(u, k=10)

    pop_recs_titles = [movie_titles.get(mid, f"ID {mid} (Title Missing)") for mid in pop_recs_ids]
    itemcf_recs_titles = [movie_titles.get(mid, f"ID {mid} (Title Missing)") for mid in itemcf_recs_ids]
    cbf_recs_titles = [movie_titles.get(mid, f"ID {mid} (Title Missing)") for mid in cbf_recs_ids]
    
    print(f"\nUser: {u}")
    
    print(f"  > Model A (Popularity Recs):")
    for i, title in enumerate(pop_recs_titles):
         print(f"    {i+1}. {title}")
         
    print(f"  > Model B (Item-CF Recs):")
    for i, title in enumerate(itemcf_recs_titles):
        print(f"    {i+1}. {title}")
        
    print(f"  > Model C (Content-Based Recs):")
    for i, title in enumerate(cbf_recs_titles):
        print(f"    {i+1}. {title}")
    
print("-" * 50)

User 1 — Popularity@10: [53, 167, 3905, 3245, 2503, 1164, 2930, 2444, 2905, 2019]

--- 9) Top-10 Recommendations for 5 Sample Users from all the 3 different models ---
User 2 — Popularity@10: [53, 167, 3905, 3245, 2503, 1164, 2930, 2444, 2905, 2019]

--- 9) Top-10 Recommendations for 5 Sample Users from all the 3 different models ---
User 3 — Popularity@10: [53, 167, 3905, 3245, 2503, 1164, 2930, 2444, 2905, 2019]

--- 9) Top-10 Recommendations for 5 Sample Users from all the 3 different models ---
User 4 — Popularity@10: [53, 167, 3905, 3245, 2503, 1164, 2930, 2444, 2905, 2019]

--- 9) Top-10 Recommendations for 5 Sample Users from all the 3 different models ---
User 5 — Popularity@10: [53, 167, 3905, 3245, 2503, 1164, 2930, 2444, 2905, 2019]

--- 9) Top-10 Recommendations for 5 Sample Users from all the 3 different models ---

User: 1
  > Model A (Popularity Recs):
    1. Lamerica (1994)
    2. Feast of July (1995)
    3. Specials, The (2000)
    4. I Am Cuba (Soy Cuba/Ya Kuba) (1964

---

## Brief Discussion (write here)
 ### What worked well? What didn’t?
  The points for worked well are:
  - Model B (Item-CF) and Model C (Content-Based) achieved significantly higher Precision@10 scores than the Popularity baseline (Model A). This means when these models made a list of 10 recommendations, a greater percentage of those items were actually relevant
  - Model B (Item- CF )performed the best among across the ranking metrics (Recall@10, MAP@10, and NDCG@10),Here it tells us that the collaborative filtering is highly effective at looking into the user preferences and providing the relevant novel recommendations.
  - Model C was  executed well the content based filter model where it created the personalized user profiles where it demonstrates that it handles the item features and provide the result that matches the user preference.
  
  The points that did not work well is:
  - Model A (Popularity model or baseline) where it provided the lowest score across all the boards for the metrics evaluated.It fails the critical test of recommendation system of recommending the relevant items.
  - All models, particularly Item-CF, has high matrix sparsity, where the similarity calculations and collaborative signals are based on very limited overlap between users/items. The Content-Based model also relies on the relatively coarse-grained feature of movie genres.

### How do the two approaches compare (strengths/weaknesses)?

 - Model A (Popularity): 
 Strengths:  Extremely fast to compute, serves   as a robust cold-start user fallback.
 Weakness : Zero personalization; fails to capture individuality.
 
 - Model B (Item -CF) : 
 Strengths: High personalization, excellent at recommending niche items based on community recommends.
 Weakness : Susceptible to the cold-start item problem,computation scales poorly with the number of items.
 
 - Model C(Content Based) : 
 Strengths : Solves the cold-start item problem (can recommend a new movie if its genres are known); interpretable.
 Weakness: Overspecialization (the "filter bubble"); poor at suggesting items outside a user's past genre history.



### Any evidence of popularity bias or cold‑start issues?

Popularity Bias Issues: Popular items probably appear frequently across different users recommendation lists,
Model A demonstrates clear popularity bias, as every user receives the exact same set of top-rated, most-frequently-seen items. Model B and C reduce this significantly by prioritizing relevance to the user's history.

Cold Start Isues.By enforcing a minimum interaction threshold of 5 during preprocessing, the issue of recommending to users with no history was largely avoided for Models B and C. Model B (Item-CF) is incapable of recommending brand-new movies because they lack the necessary rating data to compute item-to-item similarity. Model C effectively solves this, as it only requires the new movie's genre (content features) to match the user's profile.

### Possible next steps if you had more time (blending, hyperparameters, side features).

- The most immediate improvement would be to create a hybrid system, perhaps by linearly blending the scores of Model B (Item-CF) and Model C (CBF). This blend would leverage Item-CF's strong personalization while using CBF's ability to recommend new or niche items based on content.

-  Enhance Model C by incorporating more detailed content features beyond simple genres, such as TF-IDF on movie titles or plot summaries, to create richer user and item representations.

- Using Matrix Factorization where it typically handles sparsity better and yields superior performance metrics compared to neighborhood-based methods.

## Grading Rubric (100 pts)
- Data prep & clarity (10)
- Correct split + rationale (15)
- Baseline + **one additional** approach (30)
- Metrics & evaluation (25)
- Analysis & discussion + top‑N examples (20)

---

## Sample Result (for reference only)

When you complete your evaluation, you should produce a summary table like this:



![Sample Results](Sample_Results.png)



### Guidelines
- Do not worry if your numbers differ slightly; they depend on preprocessing, parameters, and randomness.  
- The important part is that **Model_B typically performs better than Model_A**, and Model_C is somewhere in between.  
- In your write-up, focus on *why* these differences occur (e.g., handling of user preferences, cold start, popularity bias).

---