# Slot Game Recommendation System with PyTorch & BPR Loss

This notebook presents a production-ready implementation of a recommendation system for our online casino. We build a Neural Collaborative Filtering (NCF) model in PyTorch that is trained using Bayesian Personalized Ranking (BPR) loss. This approach is better suited for ranking tasks (i.e., generating top-N recommendations) compared to using a binary cross-entropy loss.

**Key Steps:**
1. Load and explore the data from `data.csv`
2. Preprocess the data and map `player_id` and `game_id` to indices
3. Create a custom PyTorch Dataset that performs pairwise (positive/negative) sampling for BPR loss
4. Build the NCF model in PyTorch with embedding layers for players and games
5. Train the model using BPR loss
6. Extract learned game embeddings and build a FAISS index for fast similarity search
7. Demonstrate a real-time recommendation query using the FAISS index


In [29]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

/kaggle/input/slotscsv/data.csv


In [30]:
!pip install faiss-cpu



In [31]:
# Cell 1: Import Libraries and Load Data

import pandas as pd
import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader
import random
import faiss
import matplotlib.pyplot as plt

# Set random seeds for reproducibility
SEED = 42
np.random.seed(SEED)
torch.manual_seed(SEED)
random.seed(SEED)

# Load the data from data.csv
data = pd.read_csv('/kaggle/input/slotscsv/data.csv')
print("Data shape:", data.shape)
data.head()


Data shape: (7082, 4)


Unnamed: 0,player_id,game_id,score,normalized_score
0,0,0,0.298322,0.050966
1,0,1,0.465837,0.079584
2,0,2,0.506277,0.086493
3,0,3,0.341687,0.058375
4,0,4,0.667351,0.114012


## Data Exploration and Preprocessing

We inspect the dataset and encode `player_id` and `game_id` as consecutive integer indices.


In [32]:
# Cell 2: Data Exploration and Preprocessing

# Explore dataset information
print(data.info())
print(data.describe())

# Create mapping dictionaries for players and games
player_ids = data['player_id'].unique().tolist()
game_ids = data['game_id'].unique().tolist()

player2idx = {player: idx for idx, player in enumerate(player_ids)}
game2idx = {game: idx for idx, game in enumerate(game_ids)}

# Map original ids to indices
data['player_idx'] = data['player_id'].map(player2idx)
data['game_idx'] = data['game_id'].map(game2idx)

data.head()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7082 entries, 0 to 7081
Data columns (total 4 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   player_id         7082 non-null   int64  
 1   game_id           7082 non-null   int64  
 2   score             7082 non-null   float64
 3   normalized_score  7082 non-null   float64
dtypes: float64(2), int64(2)
memory usage: 221.4 KB
None
         player_id      game_id        score  normalized_score
count  7082.000000  7082.000000  7082.000000       7082.000000
mean    253.180175    23.273934     0.364917          0.070602
std     143.032060    13.422157     0.169729          0.067376
min       0.000000     0.000000     0.005913          0.001235
25%     129.000000    12.000000     0.239219          0.039379
50%     254.000000    23.000000     0.347944          0.055885
75%     377.000000    34.000000     0.471786          0.078073
max     499.000000    49.000000     0.976543       

Unnamed: 0,player_id,game_id,score,normalized_score,player_idx,game_idx
0,0,0,0.298322,0.050966,0,0
1,0,1,0.465837,0.079584,0,1
2,0,2,0.506277,0.086493,0,2
3,0,3,0.341687,0.058375,0,3
4,0,4,0.667351,0.114012,0,4


## Creating a PyTorch Dataset with Pairwise Sampling

For a ranking-based model, we need to sample negative examples. For each positive interaction (user, game) we randomly sample a negative game that the user has not interacted with. This dataset will return a triplet: (user, positive game, negative game).


In [33]:
# Cell 3: PyTorch Dataset for Pairwise Ranking (BPR)

class SlotPairwiseDataset(Dataset):
    def __init__(self, dataframe, num_games):
        self.data = dataframe
        self.num_games = num_games
        # Create a mapping: user -> set of games the user has interacted with
        self.user_positive = self.data.groupby('player_idx')['game_idx'].apply(set).to_dict()
        
    def __len__(self):
        return len(self.data)
    
    def __getitem__(self, idx):
        row = self.data.iloc[idx]
        user = row['player_idx']
        pos_game = row['game_idx']
        # Sample a negative game that the user hasn't interacted with
        positive_set = self.user_positive[user]
        neg_game = random.randint(0, self.num_games - 1)
        while neg_game in positive_set:
            neg_game = random.randint(0, self.num_games - 1)
        return torch.tensor(user, dtype=torch.long), \
               torch.tensor(pos_game, dtype=torch.long), \
               torch.tensor(neg_game, dtype=torch.long)

num_games = len(game2idx)
dataset = SlotPairwiseDataset(data, num_games)
dataloader = DataLoader(dataset, batch_size=256, shuffle=True, num_workers=2)

# Display a sample from the dataset
sample = dataset[0]
print("Sample (user, positive game, negative game):", sample)


Sample (user, positive game, negative game): (tensor(0), tensor(0), tensor(40))


## Defining the NCF Model in PyTorch

The model uses embedding layers for both users and games, concatenates their embeddings, and passes the result through fully connected layers.


In [34]:
# Cell 4: Define the NCF Model

class NCFModel(nn.Module):
    def __init__(self, num_players, num_games, embedding_dim):
        super(NCFModel, self).__init__()
        self.player_embedding = nn.Embedding(num_players, embedding_dim)
        self.game_embedding = nn.Embedding(num_games, embedding_dim)
        
        self.fc1 = nn.Linear(embedding_dim * 2, 128)
        self.fc2 = nn.Linear(128, 64)
        self.fc3 = nn.Linear(64, 32)
        self.output = nn.Linear(32, 1)
        self.dropout = nn.Dropout(0.3)
        
    def forward(self, user, game):
        # user and game tensors are of shape (batch, 1) due to unsqueeze in predict_user_scores
        user_emb = self.player_embedding(user).squeeze(1)  # now shape: (batch, embedding_dim)
        game_emb = self.game_embedding(game).squeeze(1)      # now shape: (batch, embedding_dim)
        x = torch.cat([user_emb, game_emb], dim=1)           # shape: (batch, embedding_dim * 2)
        x = torch.relu(self.fc1(x))
        x = self.dropout(x)
        x = torch.relu(self.fc2(x))
        x = torch.relu(self.fc3(x))
        score = self.output(x)
        return score


num_players = len(player2idx)
embedding_dim = 32

model = NCFModel(num_players, num_games, embedding_dim)
print(model)


NCFModel(
  (player_embedding): Embedding(500, 32)
  (game_embedding): Embedding(50, 32)
  (fc1): Linear(in_features=64, out_features=128, bias=True)
  (fc2): Linear(in_features=128, out_features=64, bias=True)
  (fc3): Linear(in_features=64, out_features=32, bias=True)
  (output): Linear(in_features=32, out_features=1, bias=True)
  (dropout): Dropout(p=0.3, inplace=False)
)


## Defining the BPR Loss Function

We implement Bayesian Personalized Ranking (BPR) loss. For each triplet (user, positive, negative), the loss encourages the score of the positive item to be higher than the negative.


In [35]:
# Cell 5: BPR Loss Function

def bpr_loss(pos_scores, neg_scores):
    # Loss = -log(sigmoid(pos - neg))
    loss = -torch.log(torch.sigmoid(pos_scores - neg_scores) + 1e-8).mean()
    return loss


## Training with Validation and Early Stopping

To monitor our model's performance and prevent overfitting, we now split our dataset into training and validation sets (80% training, 20% validation). We will track the validation loss at the end of each epoch and use early stopping—halting training if the validation loss does not improve for a specified number of epochs (patience). This approach ensures that our model generalizes well to unseen data.

In [36]:
from sklearn.model_selection import train_test_split
from torch.utils.data import Subset

# Split the indices for training (80%) and validation (20%)
all_indices = np.arange(len(dataset))
train_indices, val_indices = train_test_split(all_indices, test_size=0.2, random_state=SEED)

# Create subset datasets for training and validation
train_subset = Subset(dataset, train_indices)
val_subset = Subset(dataset, val_indices)

# Create DataLoaders for training and validation
train_loader = DataLoader(train_subset, batch_size=256, shuffle=True, num_workers=2)
val_loader = DataLoader(val_subset, batch_size=256, shuffle=False, num_workers=2)

print("Training samples:", len(train_subset))
print("Validation samples:", len(val_subset))


Training samples: 5665
Validation samples: 1417


In [37]:
# Early stopping parameters
best_val_loss = float('inf')
patience = 5  # Number of epochs to wait for improvement before stopping
epochs_without_improvement = 0
optimizer = optim.Adam(model.parameters(), lr=0.001)
num_epochs = 40

for epoch in range(num_epochs):
    model.train()
    train_loss = 0.0
    for user, pos_game, neg_game in train_loader:
        optimizer.zero_grad()
        pos_score = model(user, pos_game).squeeze()
        neg_score = model(user, neg_game).squeeze()
        loss = bpr_loss(pos_score, neg_score)
        loss.backward()
        optimizer.step()
        train_loss += loss.item() * user.size(0)
    train_loss /= len(train_subset)
    
    # Evaluate on the validation set
    model.eval()
    val_loss = 0.0
    with torch.no_grad():
        for user, pos_game, neg_game in val_loader:
            pos_score = model(user, pos_game).squeeze()
            neg_score = model(user, neg_game).squeeze()
            loss = bpr_loss(pos_score, neg_score)
            val_loss += loss.item() * user.size(0)
    val_loss /= len(val_subset)
    
    print(f"Epoch {epoch+1}/{num_epochs}, Train Loss: {train_loss:.4f}, Val Loss: {val_loss:.4f}")
    
    # Early stopping check
    if val_loss < best_val_loss:
        best_val_loss = val_loss
        epochs_without_improvement = 0
        # Optionally, save the model state here:
        # torch.save(model.state_dict(), 'best_model.pth')
    else:
        epochs_without_improvement += 1
    
    if epochs_without_improvement >= patience:
        print("Early stopping triggered!")
        break


Epoch 1/40, Train Loss: 0.6899, Val Loss: 0.6834
Epoch 2/40, Train Loss: 0.6705, Val Loss: 0.6630
Epoch 3/40, Train Loss: 0.6535, Val Loss: 0.6486
Epoch 4/40, Train Loss: 0.6486, Val Loss: 0.6316
Epoch 5/40, Train Loss: 0.6375, Val Loss: 0.6348
Epoch 6/40, Train Loss: 0.6376, Val Loss: 0.6377
Epoch 7/40, Train Loss: 0.6292, Val Loss: 0.6315
Epoch 8/40, Train Loss: 0.6265, Val Loss: 0.6312
Epoch 9/40, Train Loss: 0.6197, Val Loss: 0.6270
Epoch 10/40, Train Loss: 0.6177, Val Loss: 0.6284
Epoch 11/40, Train Loss: 0.6105, Val Loss: 0.6195
Epoch 12/40, Train Loss: 0.6147, Val Loss: 0.6260
Epoch 13/40, Train Loss: 0.6071, Val Loss: 0.6123
Epoch 14/40, Train Loss: 0.5938, Val Loss: 0.6218
Epoch 15/40, Train Loss: 0.5990, Val Loss: 0.6287
Epoch 16/40, Train Loss: 0.5925, Val Loss: 0.6065
Epoch 17/40, Train Loss: 0.5871, Val Loss: 0.6201
Epoch 18/40, Train Loss: 0.5753, Val Loss: 0.6121
Epoch 19/40, Train Loss: 0.5824, Val Loss: 0.6027
Epoch 20/40, Train Loss: 0.5709, Val Loss: 0.6082
Epoch 21/

## Extracting Game Embeddings

Once the model is trained, we extract the learned game (slot) embeddings. These embeddings will be used to build the FAISS index for fast similarity queries.


In [38]:
# Cell 7: Extract Game Embeddings

model.eval()
with torch.no_grad():
    # Get the game embeddings from the model
    game_embeddings = model.game_embedding.weight.detach().cpu().numpy()
print("Game Embeddings Shape:", game_embeddings.shape)


Game Embeddings Shape: (50, 32)


## Building the FAISS Index

We use FAISS to index the game embeddings. This will enable fast Approximate Nearest Neighbors (ANN) search in real time.


In [39]:
# Cell 8: Build FAISS Index

# Ensure embeddings are float32
game_embeddings = game_embeddings.astype('float32')

# Create a FAISS index using L2 distance
faiss_index = faiss.IndexFlatL2(embedding_dim)
faiss_index.add(game_embeddings)

print("FAISS index built with {} vectors.".format(faiss_index.ntotal))


FAISS index built with 50 vectors.


## Advanced Recommendation Functions for Production

In this section, we provide several utility functions that demonstrate the full potential of our recommendation system. These functions include both offline (batch) recommendations and real‑time recommendations using our FAISS index.

Before using these functions, ensure the following global variables are defined:
- **model**: The trained NCF model (set to evaluation mode when using the functions).
- **game_embeddings**: A NumPy array of game (slot) embeddings extracted from the model.
- **faiss_index**: A FAISS index built on `game_embeddings` (using L2 distance).
- **num_games**: Total number of games in the system.
- **user_positive**: A dictionary mapping each user (player_idx) to a set of game indices that the user has already interacted with.  
  For example, you can create it with:
  ```python
  user_positive = data.groupby('player_idx')['game_idx'].apply(set).to_dict()


In [40]:
# Cell: Offline Recommendation Utility Functions

def predict_user_scores(user_id, candidate_games):
    """
    Predict scores for a given user over a set of candidate games using the NCF model.
    
    Args:
        user_id (int): Index (ID) of the user.
        candidate_games (list or np.array): List of candidate game indices.
    
    Returns:
        np.array: Predicted scores for each candidate game.
    """
    model.eval()
    with torch.no_grad():
        user_tensor = torch.tensor([user_id] * len(candidate_games), dtype=torch.long).unsqueeze(1)
        game_tensor = torch.tensor(candidate_games, dtype=torch.long).unsqueeze(1)
        predictions = model(user_tensor, game_tensor).squeeze().cpu().numpy()
    return predictions


def recommend_slots_for_user(user_id, k=5):
    """
    Recommend the top-k slots for a user based solely on model-predicted scores.
    
    The function excludes games the user has already interacted with and then
    ranks the remaining candidate games based on predicted preference.
    
    Args:
        user_id (int): Index (ID) of the user.
        k (int): Number of recommendations to return.
    
    Returns:
        recommended_games (np.array): Indices of recommended games.
        predicted_scores (np.array): Predicted scores for these games.
    """
    interacted_games = user_positive.get(user_id, set())
    candidate_games = [game for game in range(num_games) if game not in interacted_games]
    
    if not candidate_games:
        print(f"User {user_id} has interacted with all games. No recommendations available.")
        return np.array([]), np.array([])
    
    scores = predict_user_scores(user_id, candidate_games)
    top_indices = np.argsort(scores)[-k:][::-1]
    recommended_games = np.array(candidate_games)[top_indices]
    predicted_scores = scores[top_indices]
    return recommended_games, predicted_scores


def cosine_similarity_matrix(query_embedding, candidate_embeddings):
    """
    Compute cosine similarity between a query embedding and a set of candidate embeddings.
    
    Args:
        query_embedding (np.array): Query vector of shape (embedding_dim,).
        candidate_embeddings (np.array): Candidate embeddings of shape (num_candidates, embedding_dim).
    
    Returns:
        np.array: Cosine similarity scores scaled to the range [0, 1].
    """
    query_norm = query_embedding / np.linalg.norm(query_embedding)
    candidate_norms = candidate_embeddings / np.linalg.norm(candidate_embeddings, axis=1, keepdims=True)
    cos_sim = np.dot(candidate_norms, query_norm)
    cos_sim = (cos_sim + 1) / 2  # Scale from [-1, 1] to [0, 1]
    return cos_sim


def recommend_slots_for_user_based_on_recent_slot(user_id, recent_slot_id, k=5, alpha=0.5):
    """
    Recommend top-k slots for a user by combining the NCF model's predicted scores and the cosine
    similarity of candidate slots to a recently played slot.
    
    The final combined score is computed as:
        combined_score = alpha * (predicted score) + (1 - alpha) * (similarity score)
    
    Args:
        user_id (int): Index (ID) of the user.
        recent_slot_id (int): Index (ID) of the slot recently played by the user.
        k (int): Number of recommendations to return.
        alpha (float): Weight balancing the predicted score and similarity score.
    
    Returns:
        recommended_games (np.array): Indices of recommended games.
        combined_scores (np.array): Final combined scores used for ranking.
    """
    interacted_games = user_positive.get(user_id, set())
    candidate_games = [game for game in range(num_games) if game not in interacted_games]
    
    if not candidate_games:
        print(f"User {user_id} has interacted with all games. No recommendations available.")
        return np.array([]), np.array([])
    
    predicted_scores = predict_user_scores(user_id, candidate_games)
    recent_embedding = game_embeddings[recent_slot_id]
    candidate_embeddings = game_embeddings[candidate_games]
    similarity_scores = cosine_similarity_matrix(recent_embedding, candidate_embeddings)
    combined_scores = alpha * predicted_scores + (1 - alpha) * similarity_scores
    
    top_indices = np.argsort(combined_scores)[-k:][::-1]
    recommended_games = np.array(candidate_games)[top_indices]
    top_combined_scores = combined_scores[top_indices]
    return recommended_games, top_combined_scores


## Real-Time Recommendation Functions

For real‑time recommendations, speed is critical. Using our pre-built FAISS index, we can quickly retrieve the nearest neighbor slots for a given slot (e.g., one recently played by the user). We then filter the results based on the user's history.

Below is a real‑time recommendation function that:
1. Uses the FAISS index to quickly find similar slots to a given (recently played) slot.
2. Filters out slots that the user has already interacted with.
3. Returns the top-k most similar slots for immediate serving.


In [41]:
def realtime_recommendation_for_recent_slot(user_id, recent_slot_id, k=5, search_k=10):
    """
    Generate real-time recommendations for a user based on a recently played slot,
    using the FAISS index.
    
    Args:
        user_id (int): Index (ID) of the user.
        recent_slot_id (int): Index (ID) of the slot recently played.
        k (int): Number of recommendations to return.
        search_k (int): Number of nearest neighbors to retrieve from FAISS before filtering.
    
    Returns:
        recommended_games (np.array): Indices of recommended games.
        distances (np.array): L2 distances corresponding to the recommended games.
    """
    # Retrieve the query embedding from the recent slot.
    query_vector = np.expand_dims(game_embeddings[recent_slot_id], axis=0)
    
    # Use FAISS to search for the nearest neighbors.
    distances, indices = faiss_index.search(query_vector, search_k)
    candidate_games = indices[0]
    candidate_distances = distances[0]
    
    # Filter out games already interacted with by the user.
    interacted_games = user_positive.get(user_id, set())
    filtered = []
    for game, dist in zip(candidate_games, candidate_distances):
        if game not in interacted_games and game != recent_slot_id:
            filtered.append((game, dist))
        if len(filtered) == k:
            break
    
    if not filtered:
        print(f"No new recommendations available for user {user_id} in real-time.")
        return np.array([]), np.array([])
    
    # Return filtered recommendations.
    recommended_games = np.array([item[0] for item in filtered])
    distances_out = np.array([item[1] for item in filtered])
    return recommended_games, distances_out


## Demonstration of the Functions

Let's see how these functions work for a sample user. We demonstrate:
1. **Offline Recommendation (Model-Only):** Top‑5 recommendations based on predicted scores.
2. **Offline Hybrid Recommendation:** Combining model predictions with similarity to a recently played slot.
3. **Real-Time Recommendation:** Using the FAISS index for quick recommendations based on the recent slot.


In [42]:
# Create the user_positive mapping from the dataset
user_positive = data.groupby('player_idx')['game_idx'].apply(set).to_dict()

# Verify the mapping by printing a sample entry
print("Sample user_positive mapping:")
print({k: user_positive[k] for k in list(user_positive.keys())[:3]})


Sample user_positive mapping:
{0: {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14}, 1: {1, 2, 6, 7, 8, 12, 13, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28}, 2: {4, 5, 9, 14, 16, 19, 23, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39}}


In [43]:
# Sample parameters for demonstration
sample_user_id = 0
sample_recent_slot_id = 0  # Assume user 0 recently played slot with index 0

# 1. Offline recommendation using predicted scores.
offline_recs, offline_scores = recommend_slots_for_user(sample_user_id, k=5)
print("Offline Recommendations (Model-Only):")
print("Slot Indices:", offline_recs)
print("Predicted Scores:", offline_scores)

# 2. Offline hybrid recommendation (model predictions + recent slot similarity).
hybrid_recs, hybrid_scores = recommend_slots_for_user_based_on_recent_slot(sample_user_id, sample_recent_slot_id, k=5, alpha=0.5)
print("\nOffline Hybrid Recommendations (User based on recent slot):")
print("Slot Indices:", hybrid_recs)
print("Combined Scores:", hybrid_scores)

# 3. Real-Time recommendation using the FAISS index.
realtime_recs, realtime_dists = realtime_recommendation_for_recent_slot(sample_user_id, sample_recent_slot_id, k=5, search_k=10)
print("\nReal-Time Recommendations (using FAISS):")
print("Slot Indices:", realtime_recs)
print("L2 Distances:", realtime_dists)


Offline Recommendations (Model-Only):
Slot Indices: [24 35 16 29 48]
Predicted Scores: [1.2998497  1.1528544  1.0116501  0.78076214 0.5764941 ]

Offline Hybrid Recommendations (User based on recent slot):
Slot Indices: [24 35 16 29 48]
Combined Scores: [0.980188   0.79492486 0.7249991  0.65784085 0.58863205]

Real-Time Recommendations (using FAISS):
Slot Indices: [22 19 43 48 17]
L2 Distances: [37.157726 38.085705 46.559784 46.572144 47.28547 ]


## Explanation of Updated Evaluation Results

Below are the latest outputs from our recommendation functions. Let’s break down what these numbers mean in the context of our online casino recommendation system.

### Offline Recommendations (Model-Only)
- **Slot Indices:** `[24, 35, 16, 29, 48]`
- **Predicted Scores:** `[1.2998497, 1.1528544, 1.0116501, 0.78076214, 0.5764941]`

**Interpretation:**
- These recommendations are generated solely by the model using learned user–slot interactions.
- The predicted scores represent the model’s confidence in the user’s potential interest in each slot. For instance, slot 24 has the highest predicted score (~1.30), suggesting it’s the most likely candidate for the user.
- The ordering reflects relative preference; slots with higher scores (e.g., 24 and 35) are predicted to be more appealing compared to those with lower scores.

---

### Offline Hybrid Recommendations (User Based on Recent Slot)
- **Slot Indices:** `[24, 35, 16, 29, 48]`
- **Combined Scores:** `[0.980188, 0.79492486, 0.7249991, 0.65784085, 0.58863205]`

**Interpretation:**
- The hybrid method combines the model’s predicted scores with a cosine similarity measure between candidate slots and a recently played slot.
- In this case, the ordering remains the same as the model-only recommendations. However, the combined scores are scaled down compared to the predicted scores because the similarity component (weighted in the combination) typically ranges between 0 and 1.
- This method provides a more nuanced recommendation that takes into account both the overall user preference and the immediate context of recent activity, reinforcing slots like 24 as top recommendations.

---

### Real-Time Recommendations (Using FAISS)
- **Slot Indices:** `[22, 19, 43, 48, 17]`
- **L2 Distances:** `[37.157726, 38.085705, 46.559784, 46.572144, 47.28547]`

**Interpretation:**
- Real-time recommendations are derived using the FAISS index to quickly locate slots that are closest in the embedding space to a recently played slot.
- Here, the L2 distances indicate the similarity in terms of the learned slot representations: lower distances imply higher similarity.
- Slot 22, with the smallest distance (~37.16), is the closest match to the recent slot, even though it wasn’t selected in the offline (model-only or hybrid) recommendations. This indicates that while the model-based methods prioritize the user's overall long-term preferences, the FAISS-based real-time approach emphasizes immediate similarity.
- The difference in recommended indices between the offline and real-time methods shows that different strategies capture different aspects of user behavior (global preferences vs. immediate context).

---

### Overall Summary:
- **Consistency Across Methods:**  
  Notice that slot 48 appears in both offline approaches, suggesting it is consistently relevant according to the model. However, the real-time approach introduces new candidates (e.g., slot 22) based on embedding similarity.
- **Score vs. Distance:**  
  The offline methods return predicted or combined scores that quantify expected user interest, while the real-time method provides L2 distances indicating embedding similarity.
- **Practical Impact:**  
  These results demonstrate the complementary nature of different recommendation strategies. The model-only and hybrid methods help capture overall user preferences, whereas the FAISS-based real-time recommendations offer fast, context-aware suggestions.

In summary, your updated results are in line with expectations and illustrate how different aspects of the recommendation pipeline work together to deliver personalized slot suggestions.


## Model Evaluation

To ensure our recommendation system effectively captures user preferences and ranks relevant slots appropriately, we must evaluate its performance using offline metrics. In this section, we use a hold-out test set to measure the quality of our recommendations with the following key metrics:

- **Hit Rate (HR@K):**  
  This metric measures the proportion of users for whom at least one relevant (held-out) item appears in the top‑K recommendations. A higher hit rate indicates that the model frequently includes a relevant slot in its recommendations.

- **Normalized Discounted Cumulative Gain (NDCG@K):**  
  NDCG@K assesses the ranking quality by giving higher scores when relevant items appear near the top of the recommendation list. This metric considers the position of the relevant items, providing a more nuanced evaluation of the model's performance.

By creating a test set from held-out user interactions (e.g., the most recent interactions for each user), we simulate a real-world scenario where the model's predictions are compared against actual user behavior. This evaluation process is crucial for:
- Validating the effectiveness of our training approach and model architecture.
- Guiding further improvements and hyperparameter tuning.
- Establishing a performance baseline for future A/B testing before deployment.

Let's now evaluate our model using these metrics.


In [44]:
def hit_rate_at_k(recommended_items, ground_truth):
    """
    Compute Hit Rate at K.
    
    Args:
        recommended_items (list or np.array): Recommended item indices.
        ground_truth (set): Set of ground-truth item indices.
    
    Returns:
        int: 1 if at least one ground-truth item is in recommended_items, else 0.
    """
    return int(len(set(recommended_items) & ground_truth) > 0)

def ndcg_at_k(recommended_items, ground_truth, k):
    """
    Compute NDCG at K for a single user.
    
    Args:
        recommended_items (list or np.array): Recommended item indices (ordered by rank).
        ground_truth (set): Set of ground-truth item indices.
        k (int): Rank cutoff.
    
    Returns:
        float: NDCG score.
    """
    dcg = 0.0
    for i, item in enumerate(recommended_items[:k]):
        if item in ground_truth:
            dcg += 1 / np.log2(i + 2)  # i+2 because log2(1) = 0, so rank 1 is log2(2)
    
    # Ideal DCG: all relevant items ranked at the top
    ideal_rels = min(len(ground_truth), k)
    idcg = sum([1 / np.log2(i + 2) for i in range(ideal_rels)])
    
    return dcg / idcg if idcg > 0 else 0.0

def evaluate_model(test_interactions, k=5):
    """
    Evaluate the recommendation model using Hit Rate and NDCG metrics.
    
    Args:
        test_interactions (dict): Mapping from user_id to a set of held-out (ground-truth) game indices.
        k (int): Top-K recommendations to consider.
    
    Returns:
        metrics (dict): Dictionary with average Hit Rate and NDCG.
    """
    hit_rates = []
    ndcgs = []
    # Evaluate over each user in the test set
    for user_id, ground_truth in test_interactions.items():
        # Generate recommendations for the user using the offline recommendation function.
        recommended_games, _ = recommend_slots_for_user(user_id, k=k)
        hr = hit_rate_at_k(recommended_games, ground_truth)
        ndcg = ndcg_at_k(recommended_games, ground_truth, k)
        hit_rates.append(hr)
        ndcgs.append(ndcg)
    
    avg_hr = np.mean(hit_rates)
    avg_ndcg = np.mean(ndcgs)
    return {'Hit Rate@{}'.format(k): avg_hr, 'NDCG@{}'.format(k): avg_ndcg}

# Example: Suppose we have a test_interactions dictionary created from our hold-out set.
# For demonstration, let's create a dummy test set.
# In practice, test_interactions should be created by holding out recent interactions per user.
test_interactions = {
    0: {2, 5},   # For user 0, the held-out items are game 2 and game 5.
    1: {7},      # For user 1, the held-out item is game 7.
    2: {3, 8}    # And so on...
}

# Evaluate the model for top-5 recommendations.
evaluation_metrics = evaluate_model(test_interactions, k=5)
print("Evaluation Metrics:")
print(evaluation_metrics)


Evaluation Metrics:
{'Hit Rate@5': 0.0, 'NDCG@5': 0.0}


## Production Pipeline Summary

To ensure that our recommendation system delivers the best results in production, the company should follow this pipeline:

1. **Data Collection & Preprocessing:**  
   - Continuously collect user interactions, including play durations, frequency, and other engagement metrics.
   - Preprocess the data by mapping user and game IDs to indices and compute interaction scores.  
   - Optionally, integrate multiple signals (e.g., recency, monetary value) to refine the implicit feedback.

2. **Model Training & Embedding Extraction:**  
   - Retrain the Neural Collaborative Filtering (NCF) model periodically (e.g., daily) using the latest data.
   - Use a ranking-based loss (e.g., BPR) to directly optimize for the top‑n recommendation task.
   - Extract and store fresh player and game embeddings after each training cycle.

3. **Indexing & Real-Time Serving:**  
   - Build a FAISS index from the game embeddings to allow for lightning‑fast nearest neighbor searches.
   - Update the index regularly in sync with model retraining.
   - Use the FAISS‑based real‑time functions to quickly retrieve similar games based on a user’s recent activity.

4. **Offline & Hybrid Recommendation Generation:**  
   - Generate batch recommendations using offline functions that filter out already interacted games.
   - Combine predicted user preferences with similarity signals (e.g., recent slot similarity) to enhance personalization.

5. **Monitoring, Evaluation & Feedback Loop:**  
   - Continuously monitor key performance indicators (e.g., engagement rate, click‑through rate).
   - Perform A/B tests to validate new recommendation strategies.
   - Use feedback to adjust model parameters, sampling strategies, and the overall recommendation logic.

6. **Deployment & Maintenance:**  
   - Refactor the notebook code into modular, production‑ready services (e.g., microservices).
   - Implement automated retraining and index update pipelines.
   - Ensure robust logging, error handling, and scalability to handle high user traffic.

By following this pipeline, the company can maintain a state‑of‑the‑art recommendation system that is both highly personalized and efficient, ensuring maximum engagement and a superior user experience.