# Chess Opening Recommendation System

## Production ML System Demonstration

This notebook demonstrates a production-ready recommendation system with concepts directly applicable to marketplace and e-commerce recommendations.

### Key ML Concepts Covered:
1. **Feature Engineering**: Extracting meaningful features from raw game data
2. **Similarity Computation**: Finding similar items (openings) using cosine similarity
3. **Recommendation Strategies**: Content-based, collaborative filtering, hybrid approaches
4. **Evaluation Metrics**: Precision@K, success rate, coverage
5. **Confidence Scoring**: Bayesian approach to recommendation confidence

### Analogies to E-commerce Recommendations:
- **Chess Openings** ↔ **Products**
- **Player Performance** ↔ **User Purchase History**
- **Opening Features** ↔ **Product Features** (category, price, brand)
- **Win Rate** ↔ **Conversion Rate**
- **Similar Openings** ↔ **"Customers Also Bought"**

In [None]:
import sys
sys.path.append('..')

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime, timedelta

from src.data_fetcher import ChessComDataFetcher
from src.game_parser import GameParser
from src.analyzers.opening_recommender import OpeningRecommender

# Set display options
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
sns.set_style('whitegrid')

## 1. Data Collection & Preprocessing

Similar to: Collecting user interaction data from an e-commerce platform

In [None]:
# Configuration
USERNAME = "your_username"  # Replace with your Chess.com username
MONTHS_BACK = 6

# Fetch game data
fetcher = ChessComDataFetcher(USERNAME)
end_date = datetime.now()
start_date = end_date - timedelta(days=30 * MONTHS_BACK)

print(f"Fetching games from {start_date.date()} to {end_date.date()}...")
raw_games = fetcher.get_all_games(start_date, end_date)
print(f"Fetched {len(raw_games)} games")

In [None]:
# Parse games (feature extraction)
parser = GameParser()
parsed_games = parser.parse_games_batch(raw_games)
print(f"Successfully parsed {len(parsed_games)} games")

# Preview parsed game structure
if parsed_games:
    print("\nSample game metadata:")
    print(parsed_games[0]['game_metadata'])

## 2. Feature Engineering

Extract meaningful features from games - analogous to extracting product features in e-commerce.

In [None]:
# Initialize recommender
recommender = OpeningRecommender()

# Extract opening features
opening_features = recommender.extract_opening_features(parsed_games)

print(f"Extracted features for {len(opening_features)} openings\n")
print("Opening Features:")
opening_features.head(10)

In [None]:
# Visualize feature distributions
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# Win rate distribution
axes[0, 0].hist(opening_features['win_rate'], bins=20, edgecolor='black')
axes[0, 0].set_title('Win Rate Distribution')
axes[0, 0].set_xlabel('Win Rate')
axes[0, 0].set_ylabel('Count')

# Accuracy vs Win Rate
axes[0, 1].scatter(opening_features['avg_accuracy'], opening_features['win_rate'], 
                   alpha=0.6, s=opening_features['sample_size']*10)
axes[0, 1].set_title('Accuracy vs Win Rate (size = sample size)')
axes[0, 1].set_xlabel('Average Accuracy')
axes[0, 1].set_ylabel('Win Rate')

# Blunder rate
axes[1, 0].hist(opening_features['blunder_rate'], bins=20, edgecolor='black', color='coral')
axes[1, 0].set_title('Blunder Rate Distribution')
axes[1, 0].set_xlabel('Blunders per Game')
axes[1, 0].set_ylabel('Count')

# Sample size
axes[1, 1].hist(opening_features['sample_size'], bins=20, edgecolor='black', color='lightgreen')
axes[1, 1].set_title('Sample Size Distribution')
axes[1, 1].set_xlabel('Games Played')
axes[1, 1].set_ylabel('Count')

plt.tight_layout()
plt.show()

print("\nFeature Statistics:")
print(opening_features.describe())

## 3. Similarity-Based Recommendations

Compute opening similarities - analogous to "Customers also bought" or "Similar items" in e-commerce.

In [None]:
# Compute similarity matrix
similarity_matrix = recommender.compute_opening_similarity()

print(f"Similarity matrix shape: {similarity_matrix.shape}\n")

# Visualize similarity heatmap (for subset of openings)
n_display = min(10, len(opening_features))
plt.figure(figsize=(12, 10))
sns.heatmap(similarity_matrix[:n_display, :n_display], 
            annot=True, 
            fmt='.2f',
            xticklabels=opening_features['opening'].head(n_display),
            yticklabels=opening_features['opening'].head(n_display),
            cmap='coolwarm')
plt.title('Opening Similarity Matrix (Top 10 Openings)')
plt.xticks(rotation=45, ha='right')
plt.yticks(rotation=0)
plt.tight_layout()
plt.show()

In [None]:
# Find similar openings for your best opening
if len(opening_features) > 0:
    best_opening = opening_features.nlargest(1, 'win_rate').iloc[0]['opening']
    
    print(f"Your most successful opening: {best_opening}")
    print(f"Win rate: {opening_features[opening_features['opening']==best_opening].iloc[0]['win_rate']:.1%}\n")
    
    similar_openings = recommender.get_similar_openings(best_opening, top_k=5)
    
    print("Similar openings you might want to try:\n")
    for opening, similarity in similar_openings:
        print(f"  {opening} (similarity: {similarity:.3f})")

## 4. Recommendation Strategies

Compare different recommendation approaches - production systems typically use hybrid methods.

In [None]:
# Strategy 1: Performance-based (content-based filtering)
print("=" * 60)
print("STRATEGY 1: Performance-Based Recommendations")
print("=" * 60)
perf_recs = recommender.recommend_openings(parsed_games, strategy='performance', top_k=5)

for i, rec in enumerate(perf_recs, 1):
    print(f"\n{i}. {rec['opening']}")
    print(f"   Score: {rec['score']:.3f}")
    print(f"   Reason: {rec['reason']}")
    print(f"   Confidence: {rec['confidence']:.2f}")

In [None]:
# Strategy 2: Exploration (encourage trying new things)
print("\n" + "=" * 60)
print("STRATEGY 2: Exploration Recommendations")
print("=" * 60)
explore_recs = recommender.recommend_openings(parsed_games, strategy='exploration', top_k=5)

for i, rec in enumerate(explore_recs, 1):
    print(f"\n{i}. {rec['opening']}")
    print(f"   Score: {rec['score']:.3f}")
    print(f"   Reason: {rec['reason']}")
    print(f"   Confidence: {rec['confidence']:.2f}")

In [None]:
# Strategy 3: Similarity-based (collaborative filtering)
print("\n" + "=" * 60)
print("STRATEGY 3: Similarity-Based Recommendations")
print("=" * 60)
similar_recs = recommender.recommend_openings(parsed_games, strategy='similar', top_k=5)

for i, rec in enumerate(similar_recs, 1):
    print(f"\n{i}. {rec['opening']}")
    print(f"   Score: {rec['score']:.3f}")
    print(f"   Reason: {rec['reason']}")
    print(f"   Confidence: {rec['confidence']:.2f}")

In [None]:
# Strategy 4: Hybrid (production-grade approach)
print("\n" + "=" * 60)
print("STRATEGY 4: Hybrid Recommendations (PRODUCTION)")
print("=" * 60)
print("Combines multiple signals: performance + accuracy + experience - blunders\n")

hybrid_recs = recommender.recommend_openings(parsed_games, strategy='hybrid', top_k=5)

for i, rec in enumerate(hybrid_recs, 1):
    print(f"\n{i}. {rec['opening']}")
    print(f"   Score: {rec['score']:.3f}")
    print(f"   Reason: {rec['reason']}")
    print(f"   Confidence: {rec['confidence']:.2f}")
    if 'metadata' in rec:
        meta = rec['metadata']
        print(f"   Win Rate: {meta['win_rate']:.1%} | Accuracy: {meta['avg_accuracy']:.1f}% | Games: {meta['games_played']}")

## 5. Recommendation Quality Evaluation

Essential for A/B testing and monitoring recommendation system performance in production.

In [None]:
# Simulate evaluation: split data into train/test
split_idx = int(len(parsed_games) * 0.8)
train_games = parsed_games[:split_idx]
test_games = parsed_games[split_idx:]

print(f"Train set: {len(train_games)} games")
print(f"Test set: {len(test_games)} games\n")

# Generate recommendations from training data
recommender_eval = OpeningRecommender()
recommendations = recommender_eval.recommend_openings(train_games, strategy='hybrid', top_k=5)

# Evaluate on test data
metrics = recommender_eval.evaluate_recommendations(recommendations, test_games)

print("=" * 60)
print("RECOMMENDATION SYSTEM EVALUATION METRICS")
print("=" * 60)
print(f"\nPrecision@5: {metrics['precision_at_k']:.3f}")
print(f"  → {metrics['precision_at_k']*100:.1f}% of recommended openings were actually played")

print(f"\nSuccess Rate: {metrics['success_rate']:.3f}")
print(f"  → Win rate when following recommendations")

print(f"\nCoverage: {metrics['coverage']:.3f}")
print(f"  → Diversity of recommendations")

print(f"\nRecommendations Followed: {metrics['recommendations_followed']}")
print(f"  → Number of recommended openings actually used")

## 6. Production Considerations

### Key ML Engineering Concepts:

**1. Feature Engineering Pipeline:**
- Raw data → Structured features → Model input
- Similar to: User behavior → Features → Recommendation model

**2. Hybrid Recommendation Approach:**
- Content-based: Item features (opening statistics)
- Collaborative: User similarities (similar player patterns)
- Hybrid: Weighted combination of multiple signals

**3. Confidence Scoring:**
- Bayesian approach to handle uncertainty
- Important for cold-start problem (new items/users)

**4. Evaluation Framework:**
- Offline metrics: Precision@K, Coverage
- Online metrics: Success rate (conversion in e-commerce)
- A/B testing ready

**5. Scalability Considerations:**
- Pre-compute similarity matrices
- Cache recommendations
- Batch processing for feature extraction
- Incremental model updates

### Applications to E-commerce Recommendations:

| Chess Recommender | E-commerce Recommender |
|---|---|
| Opening features (win rate, accuracy) | Item features (price, category, condition) |
| Player history | User browse/purchase history |
| Similar openings | Similar items |
| Hybrid scoring | Multi-signal ranking |
| Confidence from sample size | Confidence from item popularity |
| Precision@K evaluation | Click-through rate, conversion rate |

In [None]:
# Visualize recommendation scores
if len(hybrid_recs) > 0:
    rec_df = pd.DataFrame(hybrid_recs)
    
    fig, axes = plt.subplots(1, 2, figsize=(14, 5))
    
    # Recommendation scores
    axes[0].barh(rec_df['opening'], rec_df['score'], color='skyblue', edgecolor='black')
    axes[0].set_xlabel('Recommendation Score')
    axes[0].set_title('Top 5 Recommended Openings')
    axes[0].invert_yaxis()
    
    # Confidence levels
    axes[1].barh(rec_df['opening'], rec_df['confidence'], color='lightcoral', edgecolor='black')
    axes[1].set_xlabel('Confidence')
    axes[1].set_title('Recommendation Confidence')
    axes[1].invert_yaxis()
    
    plt.tight_layout()
    plt.show()

## Next Steps for Production

1. **Model Persistence**: Save trained models (similarity matrices, feature encoders)
2. **API Layer**: Expose recommendations via REST API (FastAPI)
3. **Real-time Features**: Update features as new games are played
4. **Monitoring**: Track recommendation quality metrics over time
5. **A/B Testing**: Compare different strategies in production
6. **Personalization**: Incorporate user-specific context (time of day, rating changes)
7. **Explainability**: Provide clear reasons for each recommendation (already implemented!)