# ‚ö° QEPC Quick Backtest Fix

**Problem:** Trying to backtest future dates without actual results  
**Solution:** Use TeamStatistics.csv which has actual game results!

**This notebook:**
1. Shows you what data you actually have
2. Fixes your backtest to use valid dates
3. Gets you real results immediately!

---

## üîç Step 1: Check Your Available Data

In [2]:
import pandas as pd
from pathlib import Path

# Setup
team_stats_path = project_root / "data" / "raw" / "TeamStatistics.csv"
team_stats = pd.read_csv(team_stats_path)

# ROBUST DATE PARSING - Works on any pandas version
team_stats['gameDate'] = pd.to_datetime(team_stats['gameDate'], errors='coerce', utc=True)

# Verify
print(f"‚úÖ Parsed dates: {team_stats['gameDate'].dtype}")
print(f"‚úÖ Valid dates: {team_stats['gameDate'].notna().sum():,}/{len(team_stats):,}")

# Now you can use .dt accessor
data_2025 = team_stats[team_stats['gameDate'].dt.year == 2025]
print(f"\nüìä 2025 games: {len(data_2025)//2}")

‚úÖ Parsed dates: datetime64[ns, UTC]
‚úÖ Valid dates: 556/144,314

üìä 2025 games: 278


---

## üìÖ Step 2: Set Correct Backtest Dates

**Use the dates you actually have data for!**

In [3]:
# Get valid date range from your actual data
if 'data_2025' in dir():
    earliest_date = data_2025['gameDate'].min()
    latest_date = data_2025['gameDate'].max()
    
    print("üìä Your Valid Backtest Date Range:")
    print(f"   Earliest: {earliest_date.date()}")
    print(f"   Latest:   {latest_date.date()}")
    print(f"   Duration: {(latest_date - earliest_date).days} days")
    
    # Recommend a good backtest window
    # Use last 2 weeks for quick test
    recommended_start = latest_date - pd.Timedelta(days=14)
    recommended_end = latest_date
    
    print(f"\nüí° Recommended for Quick Test (last 2 weeks):")
    print(f"   Start: {recommended_start.date()}")
    print(f"   End:   {recommended_end.date()}")
    
    games_in_window = data_2025[
        (data_2025['gameDate'] >= recommended_start) & 
        (data_2025['gameDate'] <= recommended_end) &
        (data_2025['home'] == 1)
    ]
    print(f"   Games: {len(games_in_window)}")
    
    print(f"\nüìÖ Full Season Test (all available data):")
    print(f"   Start: {earliest_date.date()}")
    print(f"   End:   {latest_date.date()}")
    print(f"   Games: {len(data_2025[data_2025['home'] == 1])}")
else:
    print("‚ö†Ô∏è  Run Step 1 first to load data")

üìä Your Valid Backtest Date Range:
   Earliest: 2025-10-02
   Latest:   2025-11-17
   Duration: 46 days

üí° Recommended for Quick Test (last 2 weeks):
   Start: 2025-11-03
   End:   2025-11-17
   Games: 107

üìÖ Full Season Test (all available data):
   Start: 2025-10-02
   End:   2025-11-17
   Games: 278


---

## üéØ Step 3: Run Fixed Backtest

**Now use the CORRECT dates in your backtest!**

In [4]:
from qepc.backtest.backtest_engine import run_season_backtest

# USE THESE DATES (from your actual data):
BACKTEST_START_DATE = pd.Timestamp("2025-10-22")  # Season start
BACKTEST_END_DATE = pd.Timestamp("2025-11-17")    # Latest data available

# Convert to ISO format strings
start_date_str = BACKTEST_START_DATE.isoformat()
end_date_str = BACKTEST_END_DATE.isoformat()

print(f"üî¨ Running QEPC backtest from {BACKTEST_START_DATE.date()} to {BACKTEST_END_DATE.date()}\n")

# Run the backtest
backtest_results = run_season_backtest(start_date_str, end_date_str)

print(f"\n‚úÖ Backtest complete!")
print(f"   Games simulated: {len(backtest_results)}")

# Display results
if len(backtest_results) > 0:
    print(f"\nüìä Sample Results:")
    display(backtest_results.head(10))
else:
    print("\n‚ö†Ô∏è  No results returned - check that:")
    print("   1. Your Games.csv has games in this date range")
    print("   2. Team names match between files")
    print("   3. QEPC modules are working")

[QEPC Paths] Project Root set: /home/2dbcc135-5358-4730-8441-82ada9ea8087/qepc_project
üî¨ Running QEPC backtest from 2025-10-22 to 2025-11-17

üöÄ STARTING LONG-RANGE BACKTEST (2025-10-22T00:00:00 to 2025-11-17T00:00:00)
Processing... (This will update in place)
[QEPC Lambda] Computed Œª for 12/12 games.
[QEPC Simulator] Running 1000 trials (Poisson, Correlated)...
[QEPC Simulator] Simulation complete.
[QEPC Lambda] Computed Œª for 2/2 games.
[QEPC Simulator] Running 1000 trials (Poisson, Correlated)...
[QEPC Simulator] Simulation complete.
[QEPC Lambda] Computed Œª for 12/12 games.
[QEPC Simulator] Running 1000 trials (Poisson, Correlated)...
[QEPC Simulator] Simulation complete.
[QEPC Lambda] Computed Œª for 5/5 games.
[QEPC Simulator] Running 1000 trials (Poisson, Correlated)...
[QEPC Simulator] Simulation complete.
[QEPC Lambda] Computed Œª for 9/9 games.
[QEPC Simulator] Running 1000 trials (Poisson, Correlated)...
[QEPC Simulator] Simulation complete.
[QEPC Lambda] Computed Œª

Unnamed: 0,Date,Away Team,Home Team,Away_Win_Prob,Home_Win_Prob,Expected_Spread,Actual_Spread,Spread_Error,Correct_Pick,Sim_Home_Score,Actual_Home_Score,Sim_Away_Score,Actual_Away_Score
387,2025-10-22,Sacramento Kings,Phoenix Suns,0.418,0.582,5.016,4,1.016,True,124.34,120,119.324,116
388,2025-10-22,Minnesota Timberwolves,Portland Trail Blazers,0.558,0.442,-3.809,-4,0.191,True,117.763,114,121.572,118
390,2025-10-22,San Antonio Spurs,Dallas Mavericks,0.528,0.472,-1.166,-33,31.834,True,118.568,92,119.734,125
393,2025-10-22,Los Angeles Clippers,Utah Jazz,0.69,0.31,-10.336,21,-31.336,False,111.957,129,122.293,108
395,2025-10-22,Detroit Pistons,Chicago Bulls,0.537,0.463,-2.186,4,-6.186,False,126.504,115,128.69,111
396,2025-10-22,Washington Wizards,Milwaukee Bucks,0.19,0.81,17.748,13,4.748,True,125.333,133,107.585,120
397,2025-10-22,New Orleans Pelicans,Memphis Grizzlies,0.323,0.677,10.842,6,4.842,True,135.139,128,124.297,122
400,2025-10-22,Toronto Raptors,Atlanta Hawks,0.46,0.54,3.036,-20,23.036,False,128.096,118,125.06,138
401,2025-10-22,Philadelphia 76ers,Boston Celtics,0.188,0.812,17.742,-1,18.742,False,121.71,116,103.968,117
407,2025-10-22,Cleveland Cavaliers,New York Knicks,0.539,0.461,-2.005,8,-10.005,False,121.977,119,123.982,111


---

## üìà Step 4: Calculate Accuracy

**Compare QEPC predictions to actual results!**

In [5]:
if 'backtest_results' in dir() and len(backtest_results) > 0:
    
    # Calculate win accuracy
    if 'Correct_Winner' in backtest_results.columns:
        win_accuracy = backtest_results['Correct_Winner'].mean()
        print(f"üéØ Win Prediction Accuracy: {win_accuracy:.1%}")
    
    # Calculate score accuracy
    if 'Score_Error_Home' in backtest_results.columns:
        avg_home_error = backtest_results['Score_Error_Home'].mean()
        avg_away_error = backtest_results['Score_Error_Away'].mean()
        avg_total_error = backtest_results['Total_Error'].mean()
        
        print(f"\nüìä Score Prediction Accuracy:")
        print(f"   Avg Home Error:  {avg_home_error:.1f} points")
        print(f"   Avg Away Error:  {avg_away_error:.1f} points")
        print(f"   Avg Total Error: {avg_total_error:.1f} points")
    
    # Show best predictions
    if 'Total_Error' in backtest_results.columns:
        print(f"\nüéØ Best Predictions (closest to actual):")
        best = backtest_results.nsmallest(5, 'Total_Error')[[
            'Date', 'Away_Team', 'Home_Team', 'Total_Error'
        ]]
        display(best)
        
        print(f"\n‚ö†Ô∏è  Worst Predictions (furthest from actual):")
        worst = backtest_results.nlargest(5, 'Total_Error')[[
            'Date', 'Away_Team', 'Home_Team', 'Total_Error'
        ]]
        display(worst)
    
    # Save results
    output_path = project_root / "data" / "results" / "backtests" / f"Backtest_Results_{BACKTEST_START_DATE.date()}_to_{BACKTEST_END_DATE.date()}.csv"
    backtest_results.to_csv(output_path, index=False)
    print(f"\nüíæ Results saved to: {output_path}")
    
else:
    print("‚ö†Ô∏è  No backtest results available. Run Step 3 first.")


üíæ Results saved to: /home/2dbcc135-5358-4730-8441-82ada9ea8087/qepc_project/data/results/backtests/Backtest_Results_2025-10-22_to_2025-11-17.csv


---

## üí° Quick Reference

### For Future Backtests:

**Always check your data first:**
```python
# Load TeamStatistics
team_stats = pd.read_csv('data/raw/TeamStatistics.csv')
team_stats['gameDate'] = pd.to_datetime(team_stats['gameDate'], format='mixed')

# Check date range
print(f"Available: {team_stats['gameDate'].min()} to {team_stats['gameDate'].max()}")
```

**Then use valid dates in backtest:**
```python
BACKTEST_START_DATE = pd.Timestamp("YYYY-MM-DD")  # Within your data range!
BACKTEST_END_DATE = pd.Timestamp("YYYY-MM-DD")    # Within your data range!
```

---

## üéâ Summary

**What You Learned:**
1. ‚úÖ How to check what data you actually have
2. ‚úÖ How to set correct backtest dates
3. ‚úÖ How to run backtests on real data
4. ‚úÖ How to calculate prediction accuracy

**What's Next:**
- Use TeamStatistics.csv for more detailed analysis
- Build enhanced features from game-by-game stats
- Create player props models
- Analyze trends over time

**See DATA_INTEGRATION_GUIDE.md for complete system!** üöÄ