# Notebook 21B: Complete System Validation (Sprint + Conventional)

**Complete F1 prediction system covering ALL 24 race weekends per season.**

## Coverage:
- **Conventional Weekends (18):** Qualifying ‚Üí GP Race prediction
- **Sprint Weekends (6):** Two predictions per weekend
  - Sprint Qualifying ‚Üí Sprint Race
  - Friday Qualifying + Sprint Race ‚Üí GP Race

## Improvements over Notebook 21:
- ‚úÖ 100% race weekend coverage (was 75%)
- ‚úÖ Sprint race predictions (new capability)
- ‚úÖ Separate metrics for conventional vs sprint vs combined

## Test Season: 2024 ‚Üí 2025
**Runtime:** ~45 minutes (24 races with sprint processing)

In [1]:
# Clean error handling
import warnings
import logging
import sys

import json
import numpy as np
import pandas as pd
import fastf1 as ff1
from pathlib import Path
import copy
import time

ff1.Cache.enable_cache('../data/raw/.fastf1_cache')

warnings.filterwarnings('ignore')
logging.getLogger('fastf1').setLevel(logging.ERROR)
logging.getLogger('requests').setLevel(logging.ERROR)
logging.getLogger('urllib3').setLevel(logging.ERROR)
logging.getLogger('requests_cache').setLevel(logging.ERROR)
sys.tracebacklimit = 0

In [2]:
# Configuration
TUNED_PARAMS = {
    'initial_uncertainty': 0.20,
    'min_uncertainty': 0.06,
    'measurement_noise': 0.04,
    'driver_specific_mins': {
        'VER': 0.05, 'RUS': 0.05, 'LEC': 0.06, 'HAM': 0.06,
        'NOR': 0.06, 'ALO': 0.06, 'SAI': 0.07, 'PER': 0.10,
        'default': 0.07
    }
}

# Sprint-specific adjustments
SPRINT_PARAMS = {
    'uncertainty_multiplier': 1.3,  # 30% higher uncertainty (less practice data)
    'racecraft_factor': 0.5,        # 50% racecraft effect (shorter race)
    'dnf_factor': 0.5               # 50% DNF risk (shorter race)
}

TEST_SEASON = {'from': 2024, 'to': 2025}
TRACKED_DRIVERS = ['VER', 'HAM', 'LEC', 'NOR', 'SAI', 'PER', 'RUS', 'ALO']

## Data Extraction Functions

In [3]:
def extract_conventional_weekend(year, race_name):
    """
    Extract qualifying and race results for conventional weekend.
    
    Returns: {'weekend_type': 'conventional', 'quali': {...}, 'race': {...}}
    """
    time.sleep(0.5)
    
    try:
        quali = ff1.get_session(year, race_name, 'Q')
        quali.load(laps=False, telemetry=False, weather=False)
        
        race = ff1.get_session(year, race_name, 'R')
        race.load(laps=False, telemetry=False, weather=False)
        
        results = {'weekend_type': 'conventional', 'quali': {}, 'race': {}}
        
        # Extract quali positions
        for _, row in quali.results.iterrows():
            driver = row['Abbreviation']
            quali_pos = row['Position']
            if pd.notna(driver) and pd.notna(quali_pos) and quali_pos != '':
                try:
                    results['quali'][driver] = int(quali_pos)
                except (ValueError, TypeError):
                    pass  # Skip if can't convert
        
        # Extract race results
        for _, row in race.results.iterrows():
            driver = row['Abbreviation']
            race_pos = row['Position']
            
            dnf = row.dnf if hasattr(row, 'dnf') else False
            status = str(row['Status']) if 'Status' in row else ''
            if not dnf and status:
                dnf = 'Finished' not in status and '+' not in status
            
            if pd.notna(driver) and driver in results['quali']:
                try:
                    rpos = int(race_pos) if pd.notna(race_pos) and race_pos != '' else 20
                except (ValueError, TypeError):
                    rpos = 20
                results['race'][driver] = {
                    'race_pos': rpos,
                    'dnf': dnf
                }
        
        return results
        
    except Exception as e:
        print(f"    ‚ùå Conv Error ({race_name}): {type(e).__name__}: {str(e)}")
        return None

In [4]:
def extract_sprint_weekend(year, race_name):
    """
    Extract all sessions for sprint weekend.
    
    Returns: {
        'weekend_type': 'sprint',
        'quali': {...},          # Friday Quali (sets GP grid)
        'sprint_quali': {...},   # Sprint Qualifying
        'sprint_race': {...},    # Sprint Race
        'race': {...}            # GP Race
    }
    """
    time.sleep(0.5)
    
    try:
        # Friday Qualifying (sets GP grid)
        quali = ff1.get_session(year, race_name, 'Q')
        quali.load(laps=False, telemetry=False, weather=False)
        
        # Sprint Qualifying
        sprint_quali = ff1.get_session(year, race_name, 'SQ')
        sprint_quali.load(laps=False, telemetry=False, weather=False)
        
        # Sprint Race
        sprint_race = ff1.get_session(year, race_name, 'S')
        sprint_race.load(laps=False, telemetry=False, weather=False)
        
        # GP Race
        gp_race = ff1.get_session(year, race_name, 'R')
        gp_race.load(laps=False, telemetry=False, weather=False)
        
        results = {
            'weekend_type': 'sprint',
            'quali': {},
            'sprint_quali': {},
            'sprint_race': {},
            'race': {}
        }
        
        # Friday Quali
        for _, row in quali.results.iterrows():
            driver = row['Abbreviation']
            pos = row['Position']
            if pd.notna(driver) and pd.notna(pos):
                results['quali'][driver] = int(pos)
        
        # Sprint Quali
        for idx, row in sprint_quali.results.iterrows():
            driver = row['Abbreviation']
            # Try multiple position sources
            if pd.notna(row['Position']):
                pos = row['Position']
            elif 'GridPosition' in row.index and pd.notna(row['GridPosition']):
                pos = row['GridPosition']
            elif 'ClassifiedPosition' in row.index and pd.notna(row['ClassifiedPosition']):
                pos = row['ClassifiedPosition']
            else:
                pos = idx + 1  # Last resort
            
            if pd.notna(driver) and pd.notna(pos) and pos != '':
                try:
                    results['sprint_quali'][driver] = int(pos)
                except (ValueError, TypeError):
                    pass  # Skip if can't convert to int
        
        # Sprint Race (use GridPosition as starting position)
        for _, row in sprint_race.results.iterrows():
            driver = row['Abbreviation']
            race_pos_raw = row['Position']
            grid_pos_raw = row['GridPosition']
            
            dnf = row.dnf if hasattr(row, 'dnf') else False
            status = str(row['Status']) if 'Status' in row else ''
            if not dnf and status:
                dnf = 'Finished' not in status and '+' not in status
            
            if pd.notna(driver):
                # Extract race position
                try:
                    race_pos = int(race_pos_raw) if pd.notna(race_pos_raw) and race_pos_raw != '' else 20
                except (ValueError, TypeError):
                    race_pos = 20
                
                # Extract grid position (Sprint Qualifying result)
                try:
                    grid_pos = int(grid_pos_raw) if pd.notna(grid_pos_raw) and grid_pos_raw != '' else race_pos
                except (ValueError, TypeError):
                    grid_pos = race_pos
                
                # Store both grid and race position
                results['sprint_race'][driver] = {
                    'grid_pos': grid_pos,      # Where they started (Sprint Quali result)
                    'race_pos': race_pos,      # Where they finished
                    'dnf': dnf
                }
                
                # Also populate sprint_quali from GridPosition
                results['sprint_quali'][driver] = grid_pos
        
        # GP Race
        for _, row in gp_race.results.iterrows():
            driver = row['Abbreviation']
            pos = row['Position']
            
            dnf = row.dnf if hasattr(row, 'dnf') else False
            status = str(row['Status']) if 'Status' in row else ''
            if not dnf and status:
                dnf = 'Finished' not in status and '+' not in status
            
            if pd.notna(driver) and driver in results['quali']:
                try:
                    race_pos = int(pos) if pd.notna(pos) and pos != '' else 20
                except (ValueError, TypeError):
                    race_pos = 20
                results['race'][driver] = {
                    'race_pos': race_pos,
                    'dnf': dnf
                }
        
        return results
        
    except Exception as e:
        print(f"    ‚ùå Sprint Error ({race_name}): {type(e).__name__}: {str(e)}")
        return None

In [5]:
def extract_season_characteristics(year):
    """
    Extract prior year characteristics including racecraft.
    Handles both conventional and sprint weekends.
    """
    schedule = ff1.get_event_schedule(year)
    driver_stats = {}
    
    print(f"  Extracting {year} season characteristics...")
    
    for _, event in schedule.iterrows():
        race_name = event['EventName']
        
        # Normalize weekend type (handles 'sprint', 'sprint_qualifying', etc.)
        event_format = event['EventFormat'].lower()
        weekend_type = 'sprint' if 'sprint' in event_format else 'conventional'
        
        try:
            if weekend_type == 'sprint':
                results = extract_sprint_weekend(year, race_name)
            else:
                results = extract_conventional_weekend(year, race_name)
            
            if not results:
                continue
            
            # Process results
            quali_results = results['quali']
            race_results = results['race']
            
            for driver in quali_results:
                if driver not in driver_stats:
                    driver_stats[driver] = {
                        'quali_positions': [],
                        'race_positions': [],
                        'positions_gained': [],
                        'dnfs': 0,
                        'races': 0
                    }
                
                quali_pos = quali_results[driver]
                driver_stats[driver]['quali_positions'].append(quali_pos)
                driver_stats[driver]['races'] += 1
                
                if driver in race_results:
                    race_data = race_results[driver]
                    
                    if race_data['dnf']:
                        driver_stats[driver]['dnfs'] += 1
                    else:
                        race_pos = race_data['race_pos']
                        driver_stats[driver]['race_positions'].append(race_pos)
                        
                        # Racecraft: positions gained from quali to race
                        positions_gained = quali_pos - race_pos
                        driver_stats[driver]['positions_gained'].append(positions_gained)
        
        except Exception as e:
            print(f"    ‚ùå Error in {race_name}: {type(e).__name__}: {str(e)}")
            continue
    
    # Calculate characteristics
    characteristics = {}
    for driver, stats in driver_stats.items():
        if stats['races'] == 0:
            continue
        
        avg_quali_pos = np.mean(stats['quali_positions'])
        avg_pace = 1.0 - (avg_quali_pos - 1) / 19
        dnf_rate = stats['dnfs'] / stats['races']
        
        # Racecraft score from positions gained
        if stats['positions_gained']:
            avg_gain = np.mean(stats['positions_gained'])
            # Scale to 0-1: +3 = 1.0, 0 = 0.5, -3 = 0.0
            racecraft_score = 0.5 + (avg_gain / 6.0)
            racecraft_score = np.clip(racecraft_score, 0.0, 1.0)
        else:
            racecraft_score = 0.5
        
        characteristics[driver] = {
            'avg_quali_pace': float(avg_pace),
            'dnf_rate': float(dnf_rate),
            'racecraft_score': float(racecraft_score),
            'races_completed': stats['races']
        }
    
    print(f"  ‚úÖ Extracted characteristics for {len(characteristics)} drivers")
    return characteristics

## Prediction Functions

In [6]:
def predict_conventional_race(quali_results, driver_priors):
    """
    Predict GP race for conventional weekend.
    Same logic as Notebook 21.
    """
    predictions = {}
    
    for driver, quali_pos in quali_results.items():
        if driver not in driver_priors['drivers']:
            continue
        
        driver_data = driver_priors['drivers'][driver]
        
        racecraft_score = driver_data.get('racecraft', {}).get('skill_score', 0.5)
        dnf_prob = driver_data.get('dnf_risk', {}).get('rate', 0.1)
        quali_uncertainty = driver_data['pace']['uncertainty']
        
        # Full racecraft effect
        racecraft_delta = (racecraft_score - 0.5) * 6
        expected_pos = quali_pos - racecraft_delta
        expected_pos = np.clip(expected_pos, 1, 20)
        
        # Race uncertainty
        race_uncertainty = np.sqrt((quali_uncertainty * 19)**2 + 3**2)
        
        # Position probabilities
        positions = np.arange(1, 21)
        position_probs = np.exp(-0.5 * ((positions - expected_pos) / race_uncertainty) ** 2)
        position_probs = position_probs / position_probs.sum()
        
        # Adjust for DNF
        position_probs_finish = position_probs * (1 - dnf_prob)
        position_probs_finish[19] += dnf_prob
        
        predictions[driver] = {
            'quali_pos': quali_pos,
            'expected_race_pos': float(expected_pos),
            'race_uncertainty': float(race_uncertainty),
            'podium_probability': float(position_probs_finish[:3].sum()),
            'points_probability': float(position_probs_finish[:10].sum()),
            'dnf_probability': float(dnf_prob)
        }
    
    return predictions

In [7]:
def predict_sprint_race(sprint_quali_results, driver_priors):
    """
    Predict Sprint Race based on Sprint Qualifying.
    
    Key differences vs GP:
    - Shorter race ‚Üí less racecraft effect
    - Higher uncertainty (less practice data)
    - Lower DNF risk (shorter race)
    - Points to top 8 (not top 10)
    """
    predictions = {}
    
    for driver, sq_pos in sprint_quali_results.items():
        if driver not in driver_priors['drivers']:
            continue
        
        driver_data = driver_priors['drivers'][driver]
        
        # Sprint racecraft: 50% of full race effect
        racecraft_score = driver_data.get('racecraft', {}).get('skill_score', 0.5)
        racecraft_delta = (racecraft_score - 0.5) * 6 * SPRINT_PARAMS['racecraft_factor']
        
        # Sprint DNF risk: 50% of GP risk
        sprint_dnf_prob = driver_data.get('dnf_risk', {}).get('rate', 0.1) * SPRINT_PARAMS['dnf_factor']
        
        # Sprint uncertainty: 30% higher than conventional
        base_uncertainty = driver_data['pace']['uncertainty']
        sprint_uncertainty = base_uncertainty * SPRINT_PARAMS['uncertainty_multiplier']
        
        expected_pos = sq_pos - racecraft_delta
        expected_pos = np.clip(expected_pos, 1, 20)
        
        # Sprint race uncertainty (less chaos than full GP)
        race_uncertainty = np.sqrt((sprint_uncertainty * 19)**2 + 2**2)
        
        # Position probabilities
        positions = np.arange(1, 21)
        position_probs = np.exp(-0.5 * ((positions - expected_pos) / race_uncertainty) ** 2)
        position_probs = position_probs / position_probs.sum()
        
        # Adjust for DNF
        position_probs_finish = position_probs * (1 - sprint_dnf_prob)
        position_probs_finish[19] += sprint_dnf_prob
        
        predictions[driver] = {
            'sprint_quali_pos': sq_pos,
            'expected_sprint_pos': float(expected_pos),
            'sprint_uncertainty': float(race_uncertainty),
            'podium_probability': float(position_probs_finish[:3].sum()),
            'top8_probability': float(position_probs_finish[:8].sum()),  # Sprint points!
            'dnf_probability': float(sprint_dnf_prob)
        }
    
    return predictions

In [8]:
def predict_sprint_weekend_gp(friday_quali, sprint_race_result, driver_priors):
    """
    Predict GP Race on sprint weekend.
    
    Uses BOTH:
    - Friday Qualifying (sets official GP grid)
    - Sprint Race result (shows current form, potential damage)
    """
    predictions = {}
    
    for driver in friday_quali.keys():
        if driver not in driver_priors['drivers']:
            continue
        
        driver_data = driver_priors['drivers'][driver]
        
        quali_pos = friday_quali[driver]
        
        # Get sprint result if available
        if driver in sprint_race_result:
            sprint_data = sprint_race_result[driver]
            sprint_pos = sprint_data['race_pos']
            sprint_dnf = sprint_data['dnf']
            
            # Weighted average: 60% Quali, 40% Sprint
            # (Sprint shows form but quali sets grid)
            combined_starting_pos = 0.6 * quali_pos + 0.4 * sprint_pos
            
            # If DNF in sprint, higher uncertainty
            uncertainty_mult = 1.5 if sprint_dnf else 1.2
        else:
            combined_starting_pos = quali_pos
            uncertainty_mult = 1.2  # Sprint weekend = less practice
        
        # Full racecraft effect for GP
        racecraft_score = driver_data.get('racecraft', {}).get('skill_score', 0.5)
        racecraft_delta = (racecraft_score - 0.5) * 6
        
        # GP DNF risk (full race)
        dnf_prob = driver_data.get('dnf_risk', {}).get('rate', 0.1)
        
        expected_pos = combined_starting_pos - racecraft_delta
        expected_pos = np.clip(expected_pos, 1, 20)
        
        # GP uncertainty on sprint weekend
        base_uncertainty = driver_data['pace']['uncertainty']
        race_uncertainty = np.sqrt((base_uncertainty * uncertainty_mult * 19)**2 + 3**2)
        
        # Position probabilities
        positions = np.arange(1, 21)
        position_probs = np.exp(-0.5 * ((positions - expected_pos) / race_uncertainty) ** 2)
        position_probs = position_probs / position_probs.sum()
        
        # Adjust for DNF
        position_probs_finish = position_probs * (1 - dnf_prob)
        position_probs_finish[19] += dnf_prob
        
        predictions[driver] = {
            'quali_pos': quali_pos,
            'combined_starting_pos': float(combined_starting_pos),
            'expected_race_pos': float(expected_pos),
            'race_uncertainty': float(race_uncertainty),
            'podium_probability': float(position_probs_finish[:3].sum()),
            'points_probability': float(position_probs_finish[:10].sum()),
            'dnf_probability': float(dnf_prob)
        }
    
    return predictions

## Metrics Calculation

In [9]:
def calculate_race_metrics(predictions, actuals, points_threshold=10):
    """
    Calculate race prediction metrics.
    
    points_threshold: 10 for GP, 8 for Sprint
    """
    position_errors = []
    podium_correct = []
    points_correct = []
    dnf_brier = []
    
    for driver, pred in predictions.items():
        if driver not in actuals:
            continue
        
        actual = actuals[driver]
        actual_dnf = actual['dnf']
        
        # Position error (finishers only)
        if not actual_dnf:
            error = abs(pred['expected_race_pos'] - actual['race_pos']) if 'expected_race_pos' in pred else abs(pred['expected_sprint_pos'] - actual['race_pos'])
            position_errors.append(error)
        
        # Podium prediction
        pred_podium = pred['podium_probability'] > 0.5
        actual_podium = (actual['race_pos'] <= 3) and not actual_dnf
        podium_correct.append(pred_podium == actual_podium)
        
        # Points prediction
        points_prob_key = 'top8_probability' if points_threshold == 8 else 'points_probability'
        pred_points = pred.get(points_prob_key, pred.get('points_probability', 0)) > 0.5
        actual_points = (actual['race_pos'] <= points_threshold) and not actual_dnf
        points_correct.append(pred_points == actual_points)
        
        # DNF prediction (Brier score)
        actual_dnf_binary = 1.0 if actual_dnf else 0.0
        brier = (pred['dnf_probability'] - actual_dnf_binary) ** 2
        dnf_brier.append(brier)
    
    return {
        'position_mae': np.mean(position_errors) if position_errors else None,
        'podium_accuracy': np.mean(podium_correct) * 100 if podium_correct else None,
        'points_accuracy': np.mean(points_correct) * 100 if points_correct else None,
        'dnf_brier_score': np.mean(dnf_brier) if dnf_brier else None,
        'n': len(podium_correct)
    }

## Bayesian Update (Tuned)

In [10]:
def tuned_bayesian_update(priors, race_results, season_progress):
    """Bayesian update with tuning (same as 20B)."""
    posteriors = copy.deepcopy(priors)
    posteriors['week'] = priors.get('week', 0) + 1
    posteriors['races_seen'] = priors.get('races_seen', 0) + 1
    
    grid_size = 20
    
    for driver, result in race_results.items():
        if driver not in posteriors['drivers']:
            continue
        
        driver_data = posteriors['drivers'][driver]
        races_seen = driver_data.get('races_seen', 0)
        
        # Adaptive learning
        base_alpha = max(0.05, 1.0 / (races_seen + 2))
        if 0.4 < season_progress < 0.8:
            base_alpha *= 0.7
        alpha = base_alpha
        
        # Update pace
        observed_pace = 1.0 - (result.get('quali_pos', result.get('race_pos', 10)) - 1) / (grid_size - 1)
        prior_pace = driver_data['pace']['quali_pace']
        new_pace = (1 - alpha) * prior_pace + alpha * observed_pace
        
        driver_data['pace']['quali_pace'] = float(new_pace)
        
        # Store history
        if 'pace_history' not in driver_data:
            driver_data['pace_history'] = []
        driver_data['pace_history'].append(float(observed_pace))
        
        # Update uncertainty
        error = abs(prior_pace - observed_pace)
        prior_uncertainty = driver_data['pace']['uncertainty']
        
        dnf = result.get('dnf', False)
        
        if dnf:
            obs_var = 0.10
            n_obs = 0.5
        else:
            pace_var = np.var(driver_data['pace_history'][-5:]) if len(driver_data['pace_history']) >= 2 else 0.08
            obs_var = pace_var + TUNED_PARAMS['measurement_noise']
            n_obs = 1.0
        
        # Bayesian update
        prior_var = prior_uncertainty ** 2
        posterior_var = 1.0 / (1.0/prior_var + n_obs/obs_var)
        new_uncertainty = np.sqrt(posterior_var)
        
        # Outlier detection
        if error > 0.5:
            new_uncertainty *= 1.2
        
        # Driver-specific minimum
        driver_min = TUNED_PARAMS['driver_specific_mins'].get(driver, 0.07)
        new_uncertainty = max(new_uncertainty, driver_min)
        
        driver_data['pace']['uncertainty'] = float(new_uncertainty)
        driver_data['races_seen'] = races_seen + 1
    
    return posteriors

In [11]:
def display_sprint_weekend_predictions(race_name, 
                                       sprint_quali_results, sprint_predictions, sprint_race_results,
                                       quali_results, gp_predictions, gp_race_results):
    """
    Display both sprint race and GP predictions for sprint weekends.
    Shows predicted vs actual positions, errors, and podium comparisons.
    """
    
    print(f"\n{'='*70}")
    print(f"‚ö° {race_name.upper()} - SPRINT WEEKEND PREDICTIONS")
    print(f"{'='*70}")
    
    # === SPRINT RACE ===
    print(f"\nüèÅ SPRINT RACE (Sprint Quali ‚Üí Sprint Race):")
    print(f"{'Driver':<8} {'SQ':<6} {'Pred':<8} {'Actual':<8} {'Error':<8} {'Status'}")
    print(f"{'-'*55}")
    
    sprint_errors = []
    # Use 'expected_sprint_pos' for sprint race predictions
    for driver in sorted(sprint_predictions.keys(), 
                        key=lambda d: sprint_predictions[d]['expected_sprint_pos']):
        if driver in sprint_race_results:
            sq_pos = sprint_quali_results.get(driver, '?')
            pred_pos = int(round(sprint_predictions[driver]['expected_sprint_pos']))
            actual_pos = sprint_race_results[driver]['race_pos']
            error = pred_pos - actual_pos
            sprint_errors.append(abs(error))
            
            error_str = f"+{error}" if error > 0 else str(error)
            status = "‚úÖ" if abs(error) <= 1 else "‚ùå"
            dnf = "DNF" if sprint_race_results[driver].get('dnf', False) else ""
            
            print(f"{driver:<8} P{sq_pos:<5} P{pred_pos:<7} P{actual_pos:<7} {error_str:>4}     {status} {dnf}")
    
    # Sprint podium comparison
    sprint_pred_podium = sorted(sprint_predictions.keys(), 
                               key=lambda d: sprint_predictions[d]['expected_sprint_pos'])[:3]
    sprint_actual_podium = sorted([d for d in sprint_race_results.keys() 
                                  if sprint_race_results[d]['race_pos'] <= 3],
                                 key=lambda d: sprint_race_results[d]['race_pos'])
    
    sprint_podium_match = len(set(sprint_pred_podium) & set(sprint_actual_podium))
    
    print(f"\n  Predicted Podium: {', '.join(sprint_pred_podium)}")
    print(f"  Actual Podium:    {', '.join(sprint_actual_podium)}")
    print(f"  Match: {sprint_podium_match}/3 ({sprint_podium_match/3*100:.0f}%)")
    if sprint_errors:
        print(f"  MAE: ¬±{sum(sprint_errors)/len(sprint_errors):.1f} positions")
    
    # === GP RACE ===
    print(f"\nüèÜ GP RACE (Friday Quali + Sprint Result ‚Üí GP Race):")
    print(f"{'Driver':<8} {'Quali':<6} {'Sprint':<8} {'Pred':<8} {'Actual':<8} {'Error':<8} {'Status'}")
    print(f"{'-'*65}")
    
    gp_errors = []
    # Use 'expected_race_pos' for GP predictions (NOT 'expected_pos')
    for driver in sorted(gp_predictions.keys(), 
                        key=lambda d: gp_predictions[d]['expected_race_pos']):
        if driver in gp_race_results:
            q_pos = quali_results.get(driver, '?')
            s_pos = sprint_race_results.get(driver, {}).get('race_pos', '?')
            pred_pos = int(round(gp_predictions[driver]['expected_race_pos']))
            actual_pos = gp_race_results[driver]['race_pos']
            error = pred_pos - actual_pos
            gp_errors.append(abs(error))
            
            error_str = f"+{error}" if error > 0 else str(error)
            status = "‚úÖ" if abs(error) <= 1 else "‚ùå"
            dnf = "DNF" if gp_race_results[driver].get('dnf', False) else ""
            
            print(f"{driver:<8} P{q_pos:<5} P{s_pos:<7} P{pred_pos:<7} P{actual_pos:<7} {error_str:>4}     {status} {dnf}")
    
    # GP podium comparison
    gp_pred_podium = sorted(gp_predictions.keys(), 
                           key=lambda d: gp_predictions[d]['expected_race_pos'])[:3]
    gp_actual_podium = sorted([d for d in gp_race_results.keys() 
                              if gp_race_results[d]['race_pos'] <= 3],
                             key=lambda d: gp_race_results[d]['race_pos'])
    
    gp_podium_match = len(set(gp_pred_podium) & set(gp_actual_podium))
    
    print(f"\n  Predicted Podium: {', '.join(gp_pred_podium)}")
    print(f"  Actual Podium:    {', '.join(gp_actual_podium)}")
    print(f"  Match: {gp_podium_match}/3 ({gp_podium_match/3*100:.0f}%)")
    if gp_errors:
        print(f"  MAE: ¬±{sum(gp_errors)/len(gp_errors):.1f} positions")
    
    print(f"{'='*70}\n")

In [12]:
def display_conventional_weekend(race_name, quali_results, predictions, race_results):
    """
    Display GP race predictions for conventional weekends.
    Shows predicted vs actual positions, errors, and podium comparisons.
    """
    
    print(f"\n{'='*70}")
    print(f"üèÅ {race_name.upper()} - CONVENTIONAL WEEKEND")
    print(f"{'='*70}")
    
    print(f"\nüèÜ GP RACE (Quali ‚Üí Race):")
    print(f"{'Driver':<8} {'Quali':<8} {'Pred':<8} {'Actual':<8} {'Error':<8} {'Status'}")
    print(f"{'-'*55}")
    
    errors = []
    for driver in sorted(predictions.keys(), 
                        key=lambda d: predictions[d]['expected_race_pos']):
        if driver in race_results:
            q_pos = quali_results.get(driver, '?')
            pred_pos = int(round(predictions[driver]['expected_race_pos']))
            actual_pos = race_results[driver]['race_pos']
            error = pred_pos - actual_pos
            errors.append(abs(error))
            
            error_str = f"+{error}" if error > 0 else str(error)
            status = "‚úÖ" if abs(error) <= 1 else "‚ùå"
            dnf = "DNF" if race_results[driver].get('dnf', False) else ""
            
            print(f"{driver:<8} P{q_pos:<7} P{pred_pos:<7} P{actual_pos:<7} {error_str:>4}     {status} {dnf}")
    
    pred_podium = sorted(predictions.keys(), 
                        key=lambda d: predictions[d]['expected_race_pos'])[:3]
    actual_podium = sorted([d for d in race_results.keys() 
                           if race_results[d]['race_pos'] <= 3],
                          key=lambda d: race_results[d]['race_pos'])
    
    podium_match = len(set(pred_podium) & set(actual_podium))
    
    print(f"\n  Predicted Podium: {', '.join(pred_podium)}")
    print(f"  Actual Podium:    {', '.join(actual_podium)}")
    print(f"  Match: {podium_match}/3 ({podium_match/3*100:.0f}%)")
    if errors:
        print(f"  MAE: ¬±{sum(errors)/len(errors):.1f} positions")
    
    print(f"{'='*70}\n")

## Load Prior Season and Initialize

In [13]:
# Extract characteristics from prior season
print(f"Loading prior season: {TEST_SEASON['from']}")
characteristics = extract_season_characteristics(TEST_SEASON['from'])

# Initialize priors
priors = {
    'season': TEST_SEASON['from'],
    'week': 0,
    'races_seen': 0,
    'drivers': {}
}

for driver, chars in characteristics.items():
    priors['drivers'][driver] = {
        'pace': {
            'quali_pace': chars['avg_quali_pace'],
            'uncertainty': TUNED_PARAMS['initial_uncertainty']
        },
        'racecraft': {
            'skill_score': chars['racecraft_score']
        },
        'dnf_risk': {
            'rate': chars['dnf_rate']
        },
        'races_seen': 0,
        'pace_history': []
    }

print(f"\n‚úÖ Initialized priors for {len(priors['drivers'])} drivers")

Loading prior season: 2024
  Extracting 2024 season characteristics...
  ‚úÖ Extracted characteristics for 24 drivers

‚úÖ Initialized priors for 24 drivers


## Main Validation Loop

In [14]:
# Get test season schedule
schedule = ff1.get_event_schedule(TEST_SEASON['to'])
current_priors = copy.deepcopy(priors)

# Results storage
results = {
    'conventional': {'gp_races': []},
    'sprint': {'sprint_races': [], 'gp_races': []},
    'drivers': {driver: {
        'conv_gp_errors': [],
        'sprint_race_errors': [],
        'sprint_gp_errors': []
    } for driver in TRACKED_DRIVERS}
}

# Count only race weekends (exclude testing)
total_races = 0
for _, e in schedule.iterrows():
    event_format = e['EventFormat'].lower()
    # Skip testing events
    if 'testing' in e['EventName'].lower() or 'test' in event_format:
        continue
    # Count conventional and sprint weekends
    if event_format == 'conventional' or 'sprint' in event_format:
        total_races += 1

week = 0

print(f"\n{'='*70}")
print(f"VALIDATING {TEST_SEASON['to']} SEASON")
print("="*70)
print(f"Total races: {total_races}")
print(f"Tracked drivers: {', '.join(TRACKED_DRIVERS)}")
print()


VALIDATING 2025 SEASON
Total races: 24
Tracked drivers: VER, HAM, LEC, NOR, SAI, PER, RUS, ALO



In [15]:
for _, event in schedule.iterrows():
    # Normalize weekend type (handles 'sprint', 'sprint_qualifying', etc.)
    event_format = event['EventFormat'].lower()
    weekend_type = 'sprint' if 'sprint' in event_format else 'conventional'
    
    # Skip non-race events (including testing)
    if weekend_type not in ['conventional', 'sprint']:
        continue
    
    # Get race name
    race_name = event['EventName']
    
    # Skip testing events explicitly
    if 'testing' in race_name.lower() or 'test' in event['EventFormat'].lower():
        continue
    
    week += 1
    season_progress = week / total_races
    
    weekend_marker = "üèÅ" if weekend_type == 'conventional' else "‚ö°"
    print(f"\n{weekend_marker} Week {week}/{total_races}: {race_name} ({weekend_type.upper()})")
    
    # =================================================================
    # CONVENTIONAL WEEKEND
    # =================================================================
    
    if weekend_type == 'conventional':
        weekend_data = extract_conventional_weekend(TEST_SEASON['to'], race_name)
        
        if not weekend_data:
            print("    ‚ùå Failed to extract data")
            continue
        
        quali_results = weekend_data['quali']
        race_results = weekend_data['race']
        
        # Predict GP race
        gp_predictions = predict_conventional_race(quali_results, current_priors)
        
        # Calculate metrics
        gp_metrics = calculate_race_metrics(gp_predictions, race_results, points_threshold=10)
        # Display conventional weekend predictions
        display_conventional_weekend(race_name, quali_results, gp_predictions, race_results)

        results['conventional']['gp_races'].append(gp_metrics)
        
        if gp_metrics['podium_accuracy'] is not None:
            print(f"    GP: {gp_metrics['podium_accuracy']:.0f}% podium, ¬±{gp_metrics['position_mae']:.1f} pos")
        else:
            print(f"    GP: No valid predictions")
        
        # Store driver errors
        for driver in TRACKED_DRIVERS:
            if driver in gp_predictions and driver in race_results:
                pred = gp_predictions[driver]
                actual = race_results[driver]
                if not actual['dnf']:
                    error = abs(pred['expected_race_pos'] - actual['race_pos'])
                    results['drivers'][driver]['conv_gp_errors'].append(error)
        
        # Update priors
        update_data = {driver: {'quali_pos': quali_results[driver], 'dnf': race_results.get(driver, {}).get('dnf', False)} 
                       for driver in quali_results if driver in race_results}
        current_priors = tuned_bayesian_update(current_priors, update_data, season_progress)
    
    # =================================================================
    # SPRINT WEEKEND
    # =================================================================
    
    else:  # sprint weekend
        weekend_data = extract_sprint_weekend(TEST_SEASON['to'], race_name)
        
        if not weekend_data:
            print("    ‚ùå Failed to extract data")
            continue
        
        quali_results = weekend_data['quali']  # Friday Quali (sets GP grid)
        sprint_quali_results = weekend_data['sprint_quali']
        sprint_race_results = weekend_data['sprint_race']
        gp_race_results = weekend_data['race']
        
        # Debug output
        print(f"    ‚Üí Sprint Quali: {len(sprint_quali_results)} drivers")
        print(f"    ‚Üí Sprint Race: {len(sprint_race_results)} drivers")
        print(f"    ‚Üí GP Race: {len(gp_race_results)} drivers")
        sprint_race_results = weekend_data['sprint_race']
        gp_race_results = weekend_data['race']
        
        # 1. Predict Sprint Race
        sprint_predictions = predict_sprint_race(sprint_quali_results, current_priors)
        sprint_metrics = calculate_race_metrics(sprint_predictions, sprint_race_results, points_threshold=8)
        results['sprint']['sprint_races'].append(sprint_metrics)
        
        if sprint_metrics['podium_accuracy'] is not None:
            print(f"    Sprint: {sprint_metrics['podium_accuracy']:.0f}% podium, ¬±{sprint_metrics['position_mae']:.1f} pos")
        else:
            print(f"    Sprint: No valid predictions")
        
        # Store sprint errors
        for driver in TRACKED_DRIVERS:
            if driver in sprint_predictions and driver in sprint_race_results:
                pred = sprint_predictions[driver]
                actual = sprint_race_results[driver]
                if not actual['dnf']:
                    error = abs(pred['expected_sprint_pos'] - actual['race_pos'])
                    results['drivers'][driver]['sprint_race_errors'].append(error)
        
        # 2. Predict GP Race (using Friday Quali + Sprint result)
        gp_predictions = predict_sprint_weekend_gp(quali_results, sprint_race_results, current_priors)
        gp_metrics = calculate_race_metrics(gp_predictions, gp_race_results, points_threshold=10)

        # Display detailed sprint weekend predictions
        display_sprint_weekend_predictions(
            race_name,
            sprint_quali_results, sprint_predictions, sprint_race_results,
            quali_results, gp_predictions, gp_race_results
        )

        results['sprint']['gp_races'].append(gp_metrics)


        # Display sprint weekend predictions
        
        if gp_metrics['podium_accuracy'] is not None:
            print(f"    GP: {gp_metrics['podium_accuracy']:.0f}% podium, ¬±{gp_metrics['position_mae']:.1f} pos")
        else:
            print(f"    GP: No valid predictions")
        
        # Store sprint GP errors
        for driver in TRACKED_DRIVERS:
            if driver in gp_predictions and driver in gp_race_results:
                pred = gp_predictions[driver]
                actual = gp_race_results[driver]
                if not actual['dnf']:
                    error = abs(pred['expected_race_pos'] - actual['race_pos'])
                    results['drivers'][driver]['sprint_gp_errors'].append(error)
        
        # Update priors (use Friday Quali for update)
        update_data = {driver: {'quali_pos': quali_results[driver], 'dnf': gp_race_results.get(driver, {}).get('dnf', False)} 
                       for driver in quali_results if driver in gp_race_results}
        current_priors = tuned_bayesian_update(current_priors, update_data, season_progress)

print(f"\n{'='*70}")
print("VALIDATION COMPLETE")
print("="*70)


üèÅ Week 1/24: Australian Grand Prix (CONVENTIONAL)

üèÅ AUSTRALIAN GRAND PRIX - CONVENTIONAL WEEKEND

üèÜ GP RACE (Quali ‚Üí Race):
Driver   Quali    Pred     Actual   Error    Status
-------------------------------------------------------
NOR      P1       P1       P1          0     ‚úÖ 
PIA      P2       P2       P9         -7     ‚ùå 
VER      P3       P3       P2         +1     ‚úÖ 
RUS      P4       P4       P3         +1     ‚úÖ 
TSU      P5       P5       P12        -7     ‚ùå 
LEC      P7       P6       P8         -2     ‚ùå 
ALB      P6       P6       P5         +1     ‚úÖ 
HAM      P8       P6       P10        -4     ‚ùå 
GAS      P9       P7       P11        -4     ‚ùå 
SAI      P10      P9       P18        -9     ‚ùå DNF
ALO      P12      P12      P17        -5     ‚ùå DNF
STR      P13      P12      P6         +6     ‚ùå 
DOO      P14      P14      P19        -5     ‚ùå DNF
HUL      P17      P15      P7         +8     ‚ùå 
LAW      P18      P17      P15        +2     ‚

## Results Summary

In [16]:
# Calculate summary statistics
print(f"\n{'='*70}")
print("FINAL RESULTS")
print("="*70)

# Conventional weekends
conv_gp_races = [r for r in results['conventional']['gp_races'] if r['position_mae'] is not None]

print(f"\nüèÅ CONVENTIONAL WEEKENDS ({len(conv_gp_races)} races):")
if conv_gp_races:
    print(f"   GP Position MAE: ¬±{np.mean([r['position_mae'] for r in conv_gp_races]):.1f} positions")
    print(f"   GP Podium Accuracy: {np.mean([r['podium_accuracy'] for r in conv_gp_races]):.1f}%")
    print(f"   GP Points Accuracy: {np.mean([r['points_accuracy'] for r in conv_gp_races]):.1f}%")
    print(f"   GP DNF Brier Score: {np.mean([r['dnf_brier_score'] for r in conv_gp_races]):.3f}")

# Sprint weekends
sprint_races = [r for r in results['sprint']['sprint_races'] if r['position_mae'] is not None]
sprint_gp_races = [r for r in results['sprint']['gp_races'] if r['position_mae'] is not None]

print(f"\n‚ö° SPRINT WEEKENDS ({len(sprint_races)} races):")
if sprint_races:
    print(f"\n   Sprint Race Predictions:")
    print(f"     Position MAE: ¬±{np.mean([r['position_mae'] for r in sprint_races]):.1f} positions")
    print(f"     Podium Accuracy: {np.mean([r['podium_accuracy'] for r in sprint_races]):.1f}%")
    print(f"     Top 8 Accuracy: {np.mean([r['points_accuracy'] for r in sprint_races]):.1f}%")
    print(f"     DNF Brier Score: {np.mean([r['dnf_brier_score'] for r in sprint_races]):.3f}")

if sprint_gp_races:
    print(f"\n   GP Race Predictions (Sprint Weekend):")
    print(f"     Position MAE: ¬±{np.mean([r['position_mae'] for r in sprint_gp_races]):.1f} positions")
    print(f"     Podium Accuracy: {np.mean([r['podium_accuracy'] for r in sprint_gp_races]):.1f}%")
    print(f"     Points Accuracy: {np.mean([r['points_accuracy'] for r in sprint_gp_races]):.1f}%")
    print(f"     DNF Brier Score: {np.mean([r['dnf_brier_score'] for r in sprint_gp_races]):.3f}")

# Combined (all GP races)
all_gp_races = conv_gp_races + sprint_gp_races

print(f"\n{'='*70}")
print("COMBINED GP RACE PREDICTIONS")
print("="*70)
if all_gp_races:
    print(f"   Total GP Races: {len(all_gp_races)}")
    print(f"   Coverage: {len(all_gp_races)}/{total_races} ({len(all_gp_races)/total_races*100:.0f}%)")
    print(f"\n   Overall GP Metrics:")
    print(f"     Position MAE: ¬±{np.mean([r['position_mae'] for r in all_gp_races]):.1f} positions")
    print(f"     Podium Accuracy: {np.mean([r['podium_accuracy'] for r in all_gp_races]):.1f}%")
    print(f"     Points Accuracy: {np.mean([r['points_accuracy'] for r in all_gp_races]):.1f}%")
    print(f"     DNF Brier Score: {np.mean([r['dnf_brier_score'] for r in all_gp_races]):.3f}")


FINAL RESULTS

üèÅ CONVENTIONAL WEEKENDS (18 races):
   GP Position MAE: ¬±2.5 positions
   GP Podium Accuracy: 90.2%
   GP Points Accuracy: 79.0%
   GP DNF Brier Score: 0.208

‚ö° SPRINT WEEKENDS (6 races):

   Sprint Race Predictions:
     Position MAE: ¬±2.6 positions
     Podium Accuracy: 87.3%
     Top 8 Accuracy: 81.4%
     DNF Brier Score: 0.121

   GP Race Predictions (Sprint Weekend):
     Position MAE: ¬±2.6 positions
     Podium Accuracy: 88.2%
     Points Accuracy: 70.6%
     DNF Brier Score: 0.230

COMBINED GP RACE PREDICTIONS
   Total GP Races: 24
   Coverage: 24/24 (100%)

   Overall GP Metrics:
     Position MAE: ¬±2.5 positions
     Podium Accuracy: 89.7%
     Points Accuracy: 76.9%
     DNF Brier Score: 0.214


## Per-Driver Breakdown

In [17]:
print(f"\n{'='*70}")
print("PER-DRIVER BREAKDOWN")
print("="*70)

for driver in TRACKED_DRIVERS:
    driver_data = results['drivers'][driver]
    
    conv_errors = driver_data['conv_gp_errors']
    sprint_race_errors = driver_data['sprint_race_errors']
    sprint_gp_errors = driver_data['sprint_gp_errors']
    
    if not conv_errors and not sprint_race_errors and not sprint_gp_errors:
        continue
    
    print(f"\n{driver}:")
    
    if conv_errors:
        print(f"   Conventional GP: ¬±{np.mean(conv_errors):.2f} pos (n={len(conv_errors)})")
    
    if sprint_race_errors:
        print(f"   Sprint Race: ¬±{np.mean(sprint_race_errors):.2f} pos (n={len(sprint_race_errors)})")
    
    if sprint_gp_errors:
        print(f"   Sprint GP: ¬±{np.mean(sprint_gp_errors):.2f} pos (n={len(sprint_gp_errors)})")
    
    # Combined GP accuracy
    all_gp_errors = conv_errors + sprint_gp_errors
    if all_gp_errors:
        print(f"   Combined GP: ¬±{np.mean(all_gp_errors):.2f} pos (n={len(all_gp_errors)})")


PER-DRIVER BREAKDOWN

VER:
   Conventional GP: ¬±1.23 pos (n=17)
   Sprint Race: ¬±3.16 pos (n=6)
   Sprint GP: ¬±2.59 pos (n=6)
   Combined GP: ¬±1.58 pos (n=23)

HAM:
   Conventional GP: ¬±3.62 pos (n=16)
   Sprint Race: ¬±1.91 pos (n=6)
   Sprint GP: ¬±3.30 pos (n=4)
   Combined GP: ¬±3.56 pos (n=20)

LEC:
   Conventional GP: ¬±2.20 pos (n=17)
   Sprint Race: ¬±2.75 pos (n=5)
   Sprint GP: ¬±1.95 pos (n=4)
   Combined GP: ¬±2.15 pos (n=21)

NOR:
   Conventional GP: ¬±1.20 pos (n=15)
   Sprint Race: ¬±0.80 pos (n=5)
   Sprint GP: ¬±1.93 pos (n=6)
   Combined GP: ¬±1.41 pos (n=21)

SAI:
   Conventional GP: ¬±2.67 pos (n=10)
   Sprint Race: ¬±2.90 pos (n=5)
   Sprint GP: ¬±3.55 pos (n=5)
   Combined GP: ¬±2.96 pos (n=15)

RUS:
   Conventional GP: ¬±1.49 pos (n=17)
   Sprint Race: ¬±1.12 pos (n=6)
   Sprint GP: ¬±1.93 pos (n=6)
   Combined GP: ¬±1.61 pos (n=23)

ALO:
   Conventional GP: ¬±2.09 pos (n=13)
   Sprint Race: ¬±1.27 pos (n=4)
   Sprint GP: ¬±2.40 pos (n=4)
   Combined GP: ¬±

## Key Insights

In [18]:
print(f"\n{'='*70}")
print("KEY INSIGHTS")
print("="*70)

if conv_gp_races and sprint_gp_races:
    conv_podium = np.mean([r['podium_accuracy'] for r in conv_gp_races])
    sprint_podium = np.mean([r['podium_accuracy'] for r in sprint_gp_races])
    
    print(f"\n1. Conventional vs Sprint Weekend GP:")
    print(f"   Conventional: {conv_podium:.1f}% podium accuracy")
    print(f"   Sprint: {sprint_podium:.1f}% podium accuracy")
    diff = conv_podium - sprint_podium
    if abs(diff) < 2:
        print(f"   ‚Üí Similar accuracy despite less practice data")
    elif diff > 0:
        print(f"   ‚Üí {diff:.1f}% drop on sprint weekends (less practice data)")
    else:
        print(f"   ‚Üí {abs(diff):.1f}% better on sprint weekends (sprint shows form)")

if sprint_races:
    sprint_podium_sprint = np.mean([r['podium_accuracy'] for r in sprint_races])
    sprint_top8 = np.mean([r['points_accuracy'] for r in sprint_races])
    
    print(f"\n2. Sprint Race Predictions:")
    print(f"   Podium: {sprint_podium_sprint:.1f}% accuracy")
    print(f"   Top 8: {sprint_top8:.1f}% accuracy")
    print(f"   ‚Üí Short race makes grid position more important")

if all_gp_races:
    overall_coverage = len(all_gp_races) / total_races * 100
    print(f"\n3. System Coverage:")
    print(f"   {len(all_gp_races)}/{total_races} races ({overall_coverage:.0f}%)")
    if overall_coverage >= 99:
        print(f"   ‚úÖ Complete season coverage achieved!")
    else:
        print(f"   ‚ö†Ô∏è  Missing {total_races - len(all_gp_races)} races")


KEY INSIGHTS

1. Conventional vs Sprint Weekend GP:
   Conventional: 90.2% podium accuracy
   Sprint: 88.2% podium accuracy
   ‚Üí Similar accuracy despite less practice data

2. Sprint Race Predictions:
   Podium: 87.3% accuracy
   Top 8: 81.4% accuracy
   ‚Üí Short race makes grid position more important

3. System Coverage:
   24/24 races (100%)
   ‚úÖ Complete season coverage achieved!


## Save Results

In [19]:
output_path = Path('../data/processed/testing_files/validation')
output_path.mkdir(parents=True, exist_ok=True)

# Prepare results for saving
save_results = {
    'test_season': TEST_SEASON,
    'total_races': total_races,
    'conventional_weekends': {
        'count': len(conv_gp_races),
        'gp_metrics': {
            'position_mae': float(np.mean([r['position_mae'] for r in conv_gp_races])) if conv_gp_races else None,
            'podium_accuracy': float(np.mean([r['podium_accuracy'] for r in conv_gp_races])) if conv_gp_races else None,
            'points_accuracy': float(np.mean([r['points_accuracy'] for r in conv_gp_races])) if conv_gp_races else None,
            'dnf_brier': float(np.mean([r['dnf_brier_score'] for r in conv_gp_races])) if conv_gp_races else None
        },
        'per_race': results['conventional']['gp_races']
    },
    'sprint_weekends': {
        'count': len(sprint_races),
        'sprint_race_metrics': {
            'position_mae': float(np.mean([r['position_mae'] for r in sprint_races])) if sprint_races else None,
            'podium_accuracy': float(np.mean([r['podium_accuracy'] for r in sprint_races])) if sprint_races else None,
            'top8_accuracy': float(np.mean([r['points_accuracy'] for r in sprint_races])) if sprint_races else None,
            'dnf_brier': float(np.mean([r['dnf_brier_score'] for r in sprint_races])) if sprint_races else None
        },
        'gp_race_metrics': {
            'position_mae': float(np.mean([r['position_mae'] for r in sprint_gp_races])) if sprint_gp_races else None,
            'podium_accuracy': float(np.mean([r['podium_accuracy'] for r in sprint_gp_races])) if sprint_gp_races else None,
            'points_accuracy': float(np.mean([r['points_accuracy'] for r in sprint_gp_races])) if sprint_gp_races else None,
            'dnf_brier': float(np.mean([r['dnf_brier_score'] for r in sprint_gp_races])) if sprint_gp_races else None
        },
        'per_sprint_race': results['sprint']['sprint_races'],
        'per_gp_race': results['sprint']['gp_races']
    },
    'combined_gp': {
        'total_races': len(all_gp_races),
        'coverage_percent': float(len(all_gp_races) / total_races * 100) if total_races > 0 else 0,
        'position_mae': float(np.mean([r['position_mae'] for r in all_gp_races])) if all_gp_races else None,
        'podium_accuracy': float(np.mean([r['podium_accuracy'] for r in all_gp_races])) if all_gp_races else None,
        'points_accuracy': float(np.mean([r['points_accuracy'] for r in all_gp_races])) if all_gp_races else None,
        'dnf_brier': float(np.mean([r['dnf_brier_score'] for r in all_gp_races])) if all_gp_races else None
    },
    'per_driver': {}
}

# Add per-driver stats
for driver in TRACKED_DRIVERS:
    driver_data = results['drivers'][driver]
    all_gp_errors = driver_data['conv_gp_errors'] + driver_data['sprint_gp_errors']
    
    if all_gp_errors:
        save_results['per_driver'][driver] = {
            'conv_gp_mae': float(np.mean(driver_data['conv_gp_errors'])) if driver_data['conv_gp_errors'] else None,
            'sprint_race_mae': float(np.mean(driver_data['sprint_race_errors'])) if driver_data['sprint_race_errors'] else None,
            'sprint_gp_mae': float(np.mean(driver_data['sprint_gp_errors'])) if driver_data['sprint_gp_errors'] else None,
            'combined_gp_mae': float(np.mean(all_gp_errors))
        }

# Save to JSON
output_file = output_path / 'complete_system_sprint_and_conventional.json'
with open(output_file, 'w') as f:
    json.dump(save_results, f, indent=2)

print(f"\n‚úÖ Results saved to: {output_file}")


‚úÖ Results saved to: ../data/processed/testing_files/validation/complete_system_sprint_and_conventional.json


In [20]:
# Save 2024 priors for reuse in other notebooks
import json

with open('../data/processed/testing_files/2024_season_characteristics.json', 'w') as f:
    json.dump(priors, f, indent=2)

print("‚úÖ Saved 2024 priors to: ../data/processed/testing_files/2024_season_characteristics.json")
print(f"   Drivers: {len(priors['drivers'])}")

‚úÖ Saved 2024 priors to: ../data/processed/testing_files/2024_season_characteristics.json
   Drivers: 24


## Summary

**Notebook 21B is complete!**

### What We Built:
- ‚úÖ Complete F1 prediction system (conventional + sprint weekends)
- ‚úÖ Sprint Race predictions (Sprint Quali ‚Üí Sprint Race)
- ‚úÖ GP predictions on sprint weekends (Quali + Sprint ‚Üí GP)
- ‚úÖ Conventional weekend predictions (Quali ‚Üí GP)
- ‚úÖ 100% season coverage

### Key Results:
- **Conventional weekends:** High accuracy (baseline from Notebook 21)
- **Sprint races:** Expected to be slightly lower (less practice data)
- **Sprint weekend GPs:** Sprint result provides form indicator
- **Combined:** Weighted average across all race types

### Next Steps:
1. Compare to Notebook 21 (conventional only)
2. Analyze conventional vs sprint performance differences
3. Document system for portfolio
4. Prepare for 2026 season (new regulations + Cadillac)