# Two-Tower Model for Recommendation Retrieval

This notebook implements a two-tower embedding model for recommendation retrieval using FAISS for efficient similarity search.

In [6]:
import pandas as pd
import torch
from torch.utils.data import Dataset, DataLoader

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print("Using device:", device)

# Load interaction data (updated path)
df = pd.read_csv("dataset/amazon-beauty/amazon-beauty-train.inter", sep="\t", dtype=str)

# Keep positive interactions
df["label"] = pd.to_numeric(df["label"], errors="coerce").fillna(0).astype(int)
df = df[df["label"] == 1]

# Map user/item IDs to indices
user2idx = {u: idx for idx, u in enumerate(df["user_id"].unique())}
item2idx = {i: idx for idx, i in enumerate(df["item_id"].unique())}
idx2item = {idx: item_id for item_id, idx in item2idx.items()}  # Reverse mapping for recommendations

df["user_idx"] = df["user_id"].map(user2idx)
df["item_idx"] = df["item_id"].map(item2idx)

num_users = len(user2idx)
num_items = len(item2idx)
print(f"#users: {num_users}, #items: {num_items}")


Using device: cuda
#users: 1210271, #items: 212506


User tower: self.user_emb maps each user_idx to a vector u

Item tower: self.item_emb maps each item_idx to a vector i

Score: Dot product <u,i> is the relevance score for (user, item)

Purpose of the Two‑Tower Embedding Model:

Retrieve relevant items for a user from a very large catalog

At training time:

Show the model positive interactions (user, item) with label 1.

It learns user and item embeddings such that user vectors are close to their positive items.

At inference time:

For a given user, compute their embedding once.

Then retrieve the closest item embeddings (by dot product / cosine similarity).

Why use this architecture for retrieval:

Decoupled towers: User and item embeddings are learned separately (only combined with a simple dot product).

Precomputation: precompute and store all item embeddings once

Fast scoring: A user’s vector vs. all items is just many dot products – perfect for FAISS / ANN.





In [14]:
class InterDataset(Dataset):
    def __init__(self, df):
        self.users = torch.tensor(df["user_idx"].values, dtype=torch.long)
        self.items = torch.tensor(df["item_idx"].values, dtype=torch.long)

    def __len__(self):
        return len(self.users)

    def __getitem__(self, idx):
        return self.users[idx], self.items[idx]

dataset = InterDataset(df)
dataloader = DataLoader(dataset, batch_size=256, shuffle=True)


why batch size choose 256: 
Embedding dimension: 64

With batch_size=256:

User embeddings: 256 × 64 = 16,384 floats

Item embeddings: 256 × 64 = 16,384 floats

Total per batch: ~32K floats ≈ 128 KB 

shuffle=True → randomly reorders samples each epoch

if shuffle=False, samples stay in original order

Model sees all of user1's items together → Model might memorize user1's pattern before seeing others

In [4]:
import torch.nn as nn
import torch.nn.functional as F

embedding_dim = 64

class TwoTowerModel(nn.Module): # TwoTowerModel inherits from  nn.Module(PyTorch's base class for neural networks)
    def __init__(self, num_users, num_items, embedding_dim):
        super().__init__() # super() refers to the parent class (nn.Module), super().__init__() calls the parent's __init__() method
        self.user_emb = nn.Embedding(num_users, embedding_dim)
        self.item_emb = nn.Embedding(num_items, embedding_dim)

    def forward(self, user_idx, item_idx):
        u = self.user_emb(user_idx)
        i = self.item_emb(item_idx)
        # Dot product for retrieval score
        return (u * i).sum(dim=1)

    def get_user_embedding(self, user_idx):
        return self.user_emb(user_idx)

    def get_item_embedding(self, item_idx):
        return self.item_emb(item_idx)


Goal: learn two sets of vectors:

One vector for each user

One vector for each item

For a pair (user, item), it:

Looks up the user embedding u

Looks up the item embedding i

Computes a score = dot product u⋅i

→ higher score = model thinks this user will like this item more.
Later, use the user vector to find top‑K closest item vectors with FAISS → these are the recommended items.

In [7]:
# Use TwoTowerModel defined in previous cell
# Initialize model and move to device
model = TwoTowerModel(num_users, num_items, embedding_dim).to(device)
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
criterion = nn.BCEWithLogitsLoss()

optimizer： updates model parameters (weights) to reduce the loss during training，The optimizer updates user_emb.weight, item_emb.weight during training

a loss function (criterion):
A loss function measures how far predictions are from targets. The optimizer minimizes this loss

BCEWithLogitsLoss: BCEWithLogitsLoss = BCE(Sigmoid(logits), targets)
Binary Cross-Entropy Loss with Logits combines:
Sigmoid activation (logits(raw score)→ probabilities)
Binary cross-entropy loss



In [5]:
# Training loop
print("Starting training")
epochs = 10
for epoch in range(epochs):
    total_loss = 0
    num_batches = 0
    for batch_users, batch_items in dataloader:
        # Move to device
        batch_users = batch_users.to(device) 
        batch_items = batch_items.to(device)
        
        # Positive scores
        pos_scores = model(batch_users, batch_items)

        # Negative sampling: random items for each user
        neg_items = torch.randint(0, num_items, batch_items.shape, device=device) #For each user in the batch, randomly picks an item ID in [0, num_items). These random (user, neg_item) pairs are treated as non-relevant.
        neg_scores = model(batch_users, neg_items)

        
        # 2. Forward pass (get logits), Raw dot product scores, scores shape: [batch_size] - e.g., [256] raw numbers
        scores = torch.cat([pos_scores, neg_scores], dim=0)
        # 3. Build labels: 1 for pos, 0 for neg
        labels = torch.cat([
            torch.ones_like(pos_scores),
            torch.zeros_like(neg_scores),
        ], dim=0).float()

        optimizer.zero_grad()  # Clear previous gradients
         
        
        loss = criterion(scores, labels) # 4. Compute loss
        loss.backward() # 5. Backpropagate compute gradients
        optimizer.step() # 6. Update weights (Adam)
        
        total_loss += loss.item()
        num_batches += 1
    
    avg_loss = total_loss / num_batches if num_batches > 0 else 0
    print(f"Epoch {epoch+1} done - Average Loss: {avg_loss:.4f}")


Starting training
Epoch 1 done - Average Loss: 3.2218
Epoch 2 done - Average Loss: 2.7125
Epoch 3 done - Average Loss: 2.3158
Epoch 4 done - Average Loss: 2.0303
Epoch 5 done - Average Loss: 1.8398
Epoch 6 done - Average Loss: 1.7123
Epoch 7 done - Average Loss: 1.6306
Epoch 8 done - Average Loss: 1.5773
Epoch 9 done - Average Loss: 1.5416
Epoch 10 done - Average Loss: 1.5159


Purpose of negative sampling:

data only has positive interactions (user clicked/rated/bought item).
Without negatives, the model only sees “this pair is good” and never “this pair is bad”.

Negative sampling creates fake negative examples by pairing each user with random items they did not interact with.

The loss then pushes:

positive pairs: scores ↑

negative pairs: scores ↓

This makes the embedding space discriminative: user vectors are close to their interacted items and far from random items.
It approximates a ranking objective (relevant > non-relevant)

## Build FAISS Index for Fast Retrieval

The next cell will:
1. Extract user and item embeddings from the trained model
2. Build a FAISS index over Item Embeddings for efficient similarity search ( fast nearest‑neighbor search on large sets of vectors)
3. Enable fast retrieval of top-K recommendations

In [8]:
import faiss
import numpy as np


# Extract and Save Embeddings
model.eval()
with torch.no_grad():
    user_embeddings_raw = model.user_emb.weight.detach().cpu().numpy().astype('float32')
    item_embeddings_raw = model.item_emb.weight.detach().cpu().numpy().astype('float32')

print(f"User embeddings shape: {user_embeddings_raw.shape}")
print(f"Item embeddings shape: {item_embeddings_raw.shape}")

''' Example of user embedding
User 0 → [0.12, -0.45, 0.78, 0.33, -0.21, ..., 0.56]  ← 64 numbers
Training Data:
  User 0 bought: lipstick, foundation, mascara
  User 1 bought: shampoo, conditioner, hair gel

Model learns:
  User 0 embedding → close to makeup item embeddings
  User 1 embedding → close to haircare item embeddings
'''

# Save raw embeddings
np.save("user_embeddings.npy", user_embeddings_raw)
np.save("item_embeddings.npy", item_embeddings_raw)
print("Saved user_embeddings.npy and item_embeddings.npy")

# Create Normalized Copies for FAISS
user_embeddings = user_embeddings_raw.copy()
item_embeddings = item_embeddings_raw.copy()

# Normalize for cosine similarity (modifies in place)
faiss.normalize_L2(item_embeddings)
faiss.normalize_L2(user_embeddings)

# save normalized embeddings for later use
np.save("user_embeddings_normalized.npy", user_embeddings)
np.save("item_embeddings_normalized.npy", item_embeddings)
print("Saved normalized embeddings")

# Build FAISS Index (CPU - IVFFlat)
embedding_dim = item_embeddings.shape[1] # shape[0]: Number of items(212,506), shape[1]: Embedding dimension (64)

# Why Normalize for Cosine Similarity?
# Normalization + Inner Product = Cosine Similarity
# This makes FAISS faster while still computing cosine similarity
# If vectors are normalized (length = 1), then:
# ||A|| = 1  and  ||B|| = 1

# cosine_similarity(A, B) = (A · B) / (1 × 1) = A · B
# Cosine similarity becomes just the dot product

nlist = 1500  # Number of clusters

# finds nearest cluster center
quantizer = faiss.IndexFlatIP(embedding_dim)
# Create IVF index using the quantizer
index = faiss.IndexIVFFlat(quantizer, embedding_dim, nlist, faiss.METRIC_INNER_PRODUCT)

# Train the index (learns 1500 cluster centers)
print("Training FAISS index")
index.train(item_embeddings)

# Add items to index (assigns each item to a cluster)
print("Adding items to index")
index.add(item_embeddings)

# Set search parameter: how many clusters to search
index.nprobe = 100  # Higher = more accurate, slower

print(f"FAISS index ready: {index.ntotal:,} items")

# Save the index
faiss.write_index(index, "faiss_item_index.bin")
print("Saved faiss_item_index.bin")



User embeddings shape: (1210271, 64)
Item embeddings shape: (212506, 64)
Saved user_embeddings.npy and item_embeddings.npy
Saved normalized embeddings
Training FAISS index
Adding items to index
FAISS index ready: 212,506 items
Saved faiss_item_index.bin


## Evaluation: Hit Rate @K for Two-Tower Retrieval

This cell evaluates how well the trained two-tower + FAISS retrieval model recovers the true clicked items on the test set using HR@K.


In [10]:
# Evaluate two-tower retrieval using HR@K on test set
import numpy as np

# Load test interactions
test_path = "dataset/amazon-beauty/amazon-beauty-test.inter"
print(f"Loading test interactions from: {test_path}")

test_df = pd.read_csv(test_path, sep="\t", dtype=str)

# Keep positive interactions
test_df["label"] = pd.to_numeric(test_df["label"], errors="coerce").fillna(0).astype(int)
test_df = test_df[test_df["label"] == 1].copy()

print(f"Total positive test interactions: {len(test_df):,}")
print(f"Unique test users: {test_df['user_id'].nunique():,}")
print(f"Unique test items: {test_df['item_id'].nunique():,}")


"""Compute HR@K for two-tower + FAISS retrieval.

    Args:
        test_df: DataFrame with columns [user_id, item_id, label].
        ks: tuple of K values to evaluate.
        max_users: optional cap on number of users to speed up evaluation.
"""
def evaluate_hr_at_k(test_df, ks=(10, 20, 50), max_users=5000):
    
    hr = {k: 0 for k in ks}
    total = 0
    max_k = max(ks)
    
    test_users = test_df['user_id'].unique()
    if len(test_users) > max_users:
        test_users = np.random.choice(test_users, max_users, replace=False)
        print(f"Subsampled users: {len(test_users):,} (from {test_df['user_id'].nunique():,})")
    
    for user_id in test_users:
        if user_id not in user2idx: # Users Must Be in Training Data, limitation of two-tower model: Cold Start Problem: cannot recommend for new users
            continue
        
        user_idx = user2idx[user_id]
        
        true_items = test_df[test_df['user_id'] == user_id]['item_id'].values
        true_item_indices = [item2idx[item] for item in true_items if item in item2idx]
        
        if not true_item_indices:
            continue
        
        # user_embeddings is already normalized
        query_emb = user_embeddings[user_idx:user_idx+1]
        
        distances, indices = index.search(query_emb, k=max_k)
        retrieved = indices[0]
        
        total += 1
        
        for k in ks:
            if any(item_idx in retrieved[:k] for item_idx in true_item_indices):
                hr[k] += 1
    
    for k in ks:
        hr[k] = hr[k] / total if total > 0 else 0
    
    return hr



# Run evaluation
hr = evaluate_hr_at_k(test_df, ks=(10, 20, 50), max_users=5000)
print("\nTwo-tower retrieval performance (Hit Rate):")
for k in sorted(hr.keys()):
    print(f"HR@{k}: {hr[k]:.4f}")

'''INTERPRETATION
    two-tower model is performing barely better than random chance and worse than the simple baselines.
    The model is not learning that users should be similar to their interacted items
    The reason might be BCEWithLogitsLoss with random negatives isn't strong enough
'''



Loading test interactions from: dataset/amazon-beauty/amazon-beauty-test.inter
Total positive test interactions: 328,812
Unique test users: 322,870
Unique test items: 108,510
Subsampled users: 5,000 (from 322,870)

Two-tower retrieval performance (Hit Rate):
HR@10: 0.0002
HR@20: 0.0002
HR@50: 0.0004


In [11]:
import numpy as np

# Check 1: Are embeddings normalized?
print("=== Embedding Norms ===")
user_norms = np.linalg.norm(user_embeddings[:5], axis=1)
item_norms = np.linalg.norm(item_embeddings[:5], axis=1)
print(f"User embedding norms (first 5): {user_norms}")
print(f"Item embedding norms (first 5): {item_norms}")
print(f"Should be ~1.0 if normalized")

# Check 2: What do similarity scores look like?
print("\n=== Sample Search ===")
test_user_idx = 0
query = user_embeddings[test_user_idx:test_user_idx+1]
distances, indices = index.search(query, k=10)
print(f"Top 10 similarities: {distances[0]}")
print(f"Top 10 item indices: {indices[0]}")
print(f"Similarities should be between 0 and 1, with variation")

# Check 3: Does user's actual item appear anywhere?
print("\n=== Ground Truth Check ===")
test_user_id = list(user2idx.keys())[0]
test_user_items = test_df[test_df['user_id'] == test_user_id]['item_id'].values
print(f"User {test_user_id} test items: {test_user_items[:5]}")
if len(test_user_items) > 0 and test_user_items[0] in item2idx:
    true_item_idx = item2idx[test_user_items[0]]
    print(f"True item index: {true_item_idx}")
    print(f"Is true item in top 100? {true_item_idx in indices[0]}")
    
    # Check similarity to true item
    true_item_emb = item_embeddings[true_item_idx:true_item_idx+1]
    similarity = np.dot(query, true_item_emb.T)[0][0]
    print(f"Similarity to true item: {similarity:.4f}")

=== Embedding Norms ===
User embedding norms (first 5): [0.99999994 1.         0.99999994 1.         1.        ]
Item embedding norms (first 5): [0.99999994 1.         0.99999994 1.         0.99999994]
Should be ~1.0 if normalized

=== Sample Search ===
Top 10 similarities: [0.5199655  0.5043913  0.50290835 0.49944636 0.4846565  0.48275822
 0.47616628 0.47403243 0.46810657 0.46567205]
Top 10 item indices: [172838   6284  61912  92566   5786 200979 162254 148239  17949 148637]
Similarities should be between 0 and 1, with variation

=== Ground Truth Check ===
User 2238 test items: []


In [12]:
print("="*60)
print("DATA OVERLAP ANALYSIS")
print("="*60)

# 1. Check test users that exist in training
test_users = test_df['user_id'].unique()
train_users = set(user2idx.keys())
test_users_in_train = [u for u in test_users if u in train_users]
print(f"\nTest users: {len(test_users):,}")
print(f"Test users found in training: {len(test_users_in_train):,}")
print(f"Overlap: {len(test_users_in_train)/len(test_users)*100:.1f}%")

# 2. Check test items that exist in training
test_items = test_df['item_id'].unique()
train_items = set(item2idx.keys())
test_items_in_train = [i for i in test_items if i in train_items]
print(f"\nTest items: {len(test_items):,}")
print(f"Test items found in training: {len(test_items_in_train):,}")
print(f"Overlap: {len(test_items_in_train)/len(test_items)*100:.1f}%")

# 3. Check test interactions where BOTH user AND item are in training
valid_test = test_df[
    (test_df['user_id'].isin(train_users)) & 
    (test_df['item_id'].isin(train_items))
]
print(f"\nTotal test interactions: {len(test_df):,}")
print(f"Valid test interactions (user AND item in train): {len(valid_test):,}")
print(f"Valid percentage: {len(valid_test)/len(test_df)*100:.1f}%")

# 4. Find a valid test user for checking
print("\n" + "-"*40)
print("FINDING VALID TEST CASES")
print("-"*40)

valid_count = 0
for user_id in test_users_in_train[:100]:  # Check first 100 valid users
    user_items = test_df[test_df['user_id'] == user_id]['item_id'].values
    valid_items = [i for i in user_items if i in train_items]
    if valid_items:
        valid_count += 1
        if valid_count <= 3:  # Show first 3 examples
            user_idx = user2idx[user_id]
            item_idx = item2idx[valid_items[0]]
            
            # Check similarity
            query = user_embeddings[user_idx:user_idx+1]
            true_item_emb = item_embeddings[item_idx:item_idx+1]
            similarity = np.dot(query, true_item_emb.T)[0][0]
            
            # Check rank
            distances, indices = index.search(query, k=1000)
            rank_pos = np.where(indices[0] == item_idx)[0]
            rank = rank_pos[0] + 1 if len(rank_pos) > 0 else ">1000"
            
            print(f"\nUser: {user_id} (idx: {user_idx})")
            print(f"  True item: {valid_items[0]} (idx: {item_idx})")
            print(f"  Similarity to true item: {similarity:.4f}")
            print(f"  Rank of true item: {rank} / {index.ntotal:,}")

print(f"\nValid test cases in first 100 users: {valid_count}")

DATA OVERLAP ANALYSIS

Test users: 322,870
Test users found in training: 322,870
Overlap: 100.0%

Test items: 108,510
Test items found in training: 81,079
Overlap: 74.7%

Total test interactions: 328,812
Valid test interactions (user AND item in train): 298,228
Valid percentage: 90.7%

----------------------------------------
FINDING VALID TEST CASES
----------------------------------------

User: 1 (idx: 609763)
  True item: 154093 (idx: 14734)
  Similarity to true item: 0.0317
  Rank of true item: >1000 / 212,506

User: 2 (idx: 399475)
  True item: 135112 (idx: 77806)
  Similarity to true item: -0.1700
  Rank of true item: >1000 / 212,506

User: 5 (idx: 74264)
  True item: 45115 (idx: 28930)
  Similarity to true item: -0.0566
  Rank of true item: >1000 / 212,506

Valid test cases in first 100 users: 86


In [15]:

# IMPROVED TRAINING WITH BPR LOSS

import torch
import torch.nn as nn
import torch.nn.functional as F
import time

class TwoTowerModel(nn.Module):
    def __init__(self, num_users, num_items, embedding_dim=64):
        super().__init__()
        self.user_emb = nn.Embedding(num_users, embedding_dim)
        self.item_emb = nn.Embedding(num_items, embedding_dim)
        
        # Better initialization
        nn.init.xavier_uniform_(self.user_emb.weight)
        nn.init.xavier_uniform_(self.item_emb.weight)
    
    def forward(self, user_idx, item_idx):
        u = self.user_emb(user_idx)
        i = self.item_emb(item_idx)
        return (u * i).sum(dim=1)
    
    def get_embeddings(self, user_idx, item_idx):
        u = self.user_emb(user_idx)
        i = self.item_emb(item_idx)
        return u, i

# Reinitialize model
embedding_dim = 64
model = TwoTowerModel(num_users, num_items, embedding_dim).to(device)
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

"""
    BPR Loss: -log(sigmoid(pos_score - neg_score))
    Directly optimizes: positive items should score higher than negatives
"""
def bpr_loss(pos_scores, neg_scores):
    
    return -F.logsigmoid(pos_scores - neg_scores).mean()

print("Training with BPR Loss")
epochs = 5
num_negatives = 4

for epoch in range(epochs):
    model.train()
    total_loss = 0
    num_batches = 0
    t0 = time.time()
    
    for batch_users, batch_items in dataloader:
        batch_users = batch_users.to(device)
        batch_items = batch_items.to(device)
        batch_size = batch_users.size(0)
        
        # Positive scores
        pos_scores = model(batch_users, batch_items)
        
        # Multiple negatives and accumulate loss
        loss = 0
        for _ in range(num_negatives):
            neg_items = torch.randint(0, num_items, (batch_size,), device=device)
            neg_scores = model(batch_users, neg_items)
            loss += bpr_loss(pos_scores, neg_scores)
        
        loss = loss / num_negatives  # Average over negatives
        
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
        total_loss += loss.item()
        num_batches += 1
    
    avg_loss = total_loss / num_batches
    elapsed = time.time() - t0
    print(f"Epoch {epoch+1}/{epochs} - Loss: {avg_loss:.4f} - Time: {elapsed:.1f}s")

print("Training complete")

Training with BPR Loss...
Epoch 1/5 - Loss: 0.6875 - Time: 339.8s
Epoch 2/5 - Loss: 0.5462 - Time: 339.5s
Epoch 3/5 - Loss: 0.2789 - Time: 339.2s
Epoch 4/5 - Loss: 0.1331 - Time: 339.3s
Epoch 5/5 - Loss: 0.0682 - Time: 339.6s
Training complete!


In [16]:

# STEP 1: Extract and Normalize Embeddings

model.eval()
with torch.no_grad():
    user_emb_raw = model.user_emb.weight.detach().cpu().numpy().astype('float32')
    item_emb_raw = model.item_emb.weight.detach().cpu().numpy().astype('float32')

# Create normalized copies
user_embeddings = user_emb_raw.copy()
item_embeddings = item_emb_raw.copy()
faiss.normalize_L2(user_embeddings)
faiss.normalize_L2(item_embeddings)

print(f"Embeddings extracted and normalized")
print(f"  User embeddings: {user_embeddings.shape}")
print(f"  Item embeddings: {item_embeddings.shape}")


# STEP 2: Rebuild FAISS Index

res = faiss.StandardGpuResources()
cpu_index = faiss.IndexFlatIP(embedding_dim)
gpu_index = faiss.index_cpu_to_gpu(res, 0, cpu_index)
gpu_index.add(item_embeddings)
index = gpu_index  # Update the index variable

print(f"FAISS index rebuilt: {index.ntotal:,} items")


# STEP 3: Full Evaluation (HR@K)

print("\n" + "="*60)
print("FULL EVALUATION")
print("="*60)

def evaluate_hr_at_k(test_df, ks=(10, 20, 50, 100), max_users=5000):
    """Evaluate Hit Rate @ K."""
    hr = {k: 0 for k in ks}
    total = 0
    max_k = max(ks)
    
    test_users = test_df['user_id'].unique()
    if len(test_users) > max_users:
        np.random.seed(42)
        test_users = np.random.choice(test_users, max_users, replace=False)
    
    for user_id in test_users:
        if user_id not in user2idx:
            continue
        
        user_idx = user2idx[user_id]
        true_items = test_df[test_df['user_id'] == user_id]['item_id'].values
        true_item_indices = [item2idx[item] for item in true_items if item in item2idx]
        
        if not true_item_indices:
            continue
        
        query_emb = user_embeddings[user_idx:user_idx+1]
        distances, indices = index.search(query_emb, k=max_k)
        retrieved = indices[0]
        
        total += 1
        
        for k in ks:
            if any(item_idx in retrieved[:k] for item_idx in true_item_indices):
                hr[k] += 1
    
    for k in ks:
        hr[k] = hr[k] / total if total > 0 else 0
    
    return hr, total

hr, total_evaluated = evaluate_hr_at_k(test_df, ks=(10, 20, 50, 100), max_users=5000)

print(f"\nEvaluated {total_evaluated:,} users")
print("\nTwo-Tower Retrieval (BPR Loss):")
for k in sorted(hr.keys()):
    print(f"  HR@{k}: {hr[k]:.4f} ({hr[k]*100:.2f}%)")


# STEP 5: Compare with Baselines

print("\n" + "-"*40)
print("COMPARISON")
print("-"*40)
print("Most Popular: HR@10=0.0077, HR@20=0.0132, HR@50=0.0241")
print("Item-KNN:     HR@10=0.0123, HR@20=0.0180, HR@50=0.0294")
print(f"Two-Tower:    HR@10={hr[10]:.4f}, HR@20={hr[20]:.4f}, HR@50={hr[50]:.4f}")

if hr[10] > 0.0123:
    print("\n SUCCESS! Two-Tower beats Item-KNN baseline!")
elif hr[10] > 0.0077:
    print("\n⚠ Two-Tower beats Most Popular but not Item-KNN")
else:
    print("\n Still underperforming - may need more epochs or tuning")

Embeddings extracted and normalized
  User embeddings: (1210271, 64)
  Item embeddings: (212506, 64)
FAISS index rebuilt: 212,506 items

--- Quick Sanity Check ---
User 1: Similarity=0.2778, Rank=>1000
User 2: Similarity=-0.0600, Rank=>1000
User 5: Similarity=0.0190, Rank=>1000

FULL EVALUATION

Evaluated 4,578 users

Two-Tower Retrieval (BPR Loss):
  HR@10: 0.0072 (0.72%)
  HR@20: 0.0111 (1.11%)
  HR@50: 0.0201 (2.01%)
  HR@100: 0.0262 (2.62%)

----------------------------------------
COMPARISON
----------------------------------------
Most Popular: HR@10=0.0077, HR@20=0.0132, HR@50=0.0241
Item-KNN:     HR@10=0.0123, HR@20=0.0180, HR@50=0.0294
Two-Tower:    HR@10=0.0072, HR@20=0.0111, HR@50=0.0201

❌ Still underperforming - may need more epochs or tuning
