# 15 Blended Predictions - Model + Actual FP Data

**Normal Weekend:**
- Friday: FP1, FP2
- Saturday AM: FP3
- **Lineup lock before Qualifying** ‚Üê Use FP3 data!

**Sprint Weekend:**
- Friday: FP1, Sprint Quali
- **Lineup lock before Sprint** ‚Üê Use Sprint Quali data!

**Blend strategy:**
- 70% actual FP/Sprint Quali times (reality)
- 30% model prediction (car-track fit)

## Setup

In [1]:
import json
import sys
from pathlib import Path

import numpy as np

sys.path.append('../')

import warnings

from src.predictors.blended_predictor import (
    format_comparison,
    predict_with_blending,
)
from src.predictors.driver_predictor import DriverRanker
from src.predictors.team_predictor import rank_teams_for_track
from src.utils.lineup_manager import get_lineups

warnings.filterwarnings('ignore')

## Configuration

In [2]:
# Which season and race?
SEASON = 2025
DEMO_RACE = 'Bahrain Grand Prix'

# Blending weight (70% FP data is recommended)
FP_WEIGHT = 0.7

print(f"Season: {SEASON}")
print(f"Demo race: {DEMO_RACE}")
print(f"Blend: {FP_WEIGHT*100:.0f}% FP data, {(1-FP_WEIGHT)*100:.0f}% model")

Season: 2025
Demo race: Bahrain Grand Prix
Blend: 70% FP data, 30% model


## Load Data

In [3]:
# Track what loaded
loaded = []
errors = []

# Load track characteristics
try:
    track_path = Path(f'../data/processed/testing_files/track_characteristics/{SEASON}_track_characteristics.json')
    with open(track_path) as f:
        track_data = json.load(f)
    all_tracks = track_data.get('tracks', {})
    loaded.append(f"tracks ({len(all_tracks)})")
except FileNotFoundError:
    errors.append("track characteristics")
    all_tracks = {}

# Load car characteristics
try:
    car_path = Path(f'../data/processed/testing_files/car_characteristics/{SEASON}_car_characteristics.json')
    with open(car_path) as f:
        car_data = json.load(f)
    all_cars = car_data.get('teams', {})
    loaded.append(f"teams ({len(all_cars)})")
except FileNotFoundError:
    errors.append("car characteristics")
    all_cars = {}

# Load driver ranker
try:
    driver_ranker = DriverRanker(
        '../data/processed/testing_files/driver_characteristics/driver_characteristics.json'
    )
    loaded.append("driver ranker")
except FileNotFoundError:
    errors.append("driver characteristics")
    driver_ranker = None

# Load actual results for validation (optional)
try:
    results_path = Path(f'../data/processed/testing_files/validation/{SEASON}_qualifying_results.json')
    with open(results_path) as f:
        actual_results = json.load(f)
    loaded.append(f"results ({actual_results.get('total_races', 0)} races)")
except FileNotFoundError:
    actual_results = None

# Print summary
if loaded:
    print(f"üü¢ Loaded: {', '.join(loaded)}")
if errors:
    print(f"üî¥  Missing: {', '.join(errors)}")

Loaded characteristics for 27 drivers
üü¢ Loaded: tracks (24), teams (10), driver ranker, results (24 races)


## Demo: Single Race Blended Prediction

Compare:
1. Model-only prediction (what notebook 14 does)
2. Blended prediction (model + FP data)
3. Actual results

**Expected:** Blended should be much better than model-only!

In [4]:
if DEMO_RACE in all_tracks:
    track = all_tracks[DEMO_RACE]
    
    # Determine weekend type
    if actual_results and DEMO_RACE in actual_results['races']:
        weekend_type = actual_results['races'][DEMO_RACE]['weekend_type']
    else:
        weekend_type = 'normal'  # Default
    
    print(f"{'='*70}")
    print(f"BLENDED PREDICTION: {DEMO_RACE} ({weekend_type} weekend)")
    print(f"{'='*70}")
    
    # Step 1: Model prediction
    print("\nSTEP 1: Model Prediction")
    print("-"*70)
    
    if weekend_type == 'sprint':
        session = 'post_sprint_quali'
    else:
        session = 'post_fp3'
    
    team_rankings = rank_teams_for_track(all_cars, track, session, weekend_type)
    model_ranks = {team: rank for rank, (team, _, _, _) in enumerate(team_rankings, 1)}
    
    print(f"\n{'Rank':<6} {'Team':<25} {'Score':<10}")
    for rank, (team, score, _, _) in enumerate(team_rankings, 1):
        print(f"{rank:<6} {team:<25} {score:.3f}")
    
    # Step 2: Get actual FP data
    print("\nSTEP 2: Extract FP Data")
    print("-"*70)
    
    blend_result = predict_with_blending(
        SEASON,
        DEMO_RACE,
        model_ranks,
        weekend_type=weekend_type,
        weight_fp=FP_WEIGHT
    )
    
    print(f"Session used: {blend_result['session_used']}")
    print(f"Blend type: {blend_result['blend_type']}")
    
    if blend_result['blend_type'] == 'fp_blended':
        print("\nFP Performance:")
        sorted_fp = sorted(
            blend_result['fp_performance'].items(),
            key=lambda x: x[1],
            reverse=True
        )
        for team, perf in sorted_fp[:5]:
            print(f"  {team:<25} {perf:.3f}")
    
    # Step 3: Compare predictions
    print("\nSTEP 3: Comparison")
    print("-"*70)
    
    if actual_results and DEMO_RACE in actual_results['races']:
        # Get actual team ranks
        actual = actual_results['races'][DEMO_RACE]
        actual_team_ranks = {}
        
        for pos_data in actual['positions']:
            team = pos_data['team']
            if team not in actual_team_ranks:
                actual_team_ranks[team] = pos_data['position']
        
        format_comparison(
            model_ranks,
            blend_result['blended_ranks'],
            actual_team_ranks
        )
        
        # Calculate MAE
        model_errors = []
        blend_errors = []
        
        for team in model_ranks:
            if team in actual_team_ranks:
                model_errors.append(abs(model_ranks[team] - actual_team_ranks[team]))
                blend_errors.append(abs(blend_result['blended_ranks'][team] - actual_team_ranks[team]))
        
        if model_errors:
            print("\nACCURACY (Team Level):")
            print("-"*70)
            print(f"  Model-only MAE:  {np.mean(model_errors):.2f} positions")
            print(f"  Blended MAE:     {np.mean(blend_errors):.2f} positions")
            print(f"  Improvement:     {np.mean(model_errors) - np.mean(blend_errors):+.2f} positions")
    else:
        format_comparison(
            model_ranks,
            blend_result['blended_ranks']
        )
    
else:
    print(f"Race '{DEMO_RACE}' not found in track data")

BLENDED PREDICTION: Bahrain Grand Prix (normal weekend)

STEP 1: Model Prediction
----------------------------------------------------------------------

Rank   Team                      Score     
1      Mercedes                  0.399
2      Red Bull Racing           0.383
3      McLaren                   0.341
4      Ferrari                   0.317
5      Racing Bulls              0.292
6      Williams                  0.290
7      Aston Martin              0.210
8      Alpine                    0.209
9      Kick Sauber               0.193
10     Haas F1 Team              0.062

STEP 2: Extract FP Data
----------------------------------------------------------------------
Session used: FP3
Blend type: fp_blended

FP Performance:
  McLaren                   1.000
  Ferrari                   0.686
  Mercedes                  0.657
  Alpine                    0.545
  Racing Bulls              0.532

STEP 3: Comparison
--------------------------------------------------------------------

## Full Grid with Blended Predictions

Now convert blended team ranks to driver positions.

In [5]:
if blend_result['blend_type'] == 'fp_blended':
    print("\nFULL GRID PREDICTION (Blended)")
    print("="*70)
    
    # Get lineups
    lineups = get_lineups(SEASON, DEMO_RACE)
    
    # Predict driver positions using blended team ranks
    driver_results = driver_ranker.predict_positions(
        team_predictions=blend_result['blended_ranks'],
        team_lineups=lineups,
        session_type='qualifying'
    )
    
    print(f"\n{driver_ranker.format_predictions(driver_results, top_n=20)}")
    
    # Validate if actual results available
    if actual_results and DEMO_RACE in actual_results['races']:
        actual = actual_results['races'][DEMO_RACE]
        
        print("\nVALIDATION")
        print("-"*70)
        
        errors = []
        for pred in driver_results['predictions']:
            actual_pos = next(
                (p['position'] for p in actual['positions'] if p['driver'] == pred.driver),
                None
            )
            if actual_pos:
                errors.append(abs(pred.position - actual_pos))
        
        if errors:
            mae = np.mean(errors)
            within_1 = sum(1 for e in errors if e <= 1) / len(errors)
            within_2 = sum(1 for e in errors if e <= 2) / len(errors)
            within_3 = sum(1 for e in errors if e <= 3) / len(errors)
            
            print(f"  MAE:          {mae:.2f} positions")
            print(f"  ¬±1 position:  {within_1*100:.1f}%")
            print(f"  ¬±2 positions: {within_2*100:.1f}%")
            print(f"  ¬±3 positions: {within_3*100:.1f}%")


FULL GRID PREDICTION (Blended)

=== DRIVER POSITION PREDICTIONS (QUALIFYING) ===

Total drivers: 20

Pos  Driver  Team                    Confidence         Tier         Pace
--------------------------------------------------------------------------------
 1.  PIA     McLaren                1.0 [-3.0-5.0]    established   0.4908
 2.  NOR     McLaren                2.0 [-2.0-6.0]    established   0.5090
 3.  ANT     Mercedes               3.0 [-1.5-7.5]    developing    0.4641
 4.  RUS     Mercedes               4.0 [0.5-7.5]     veteran       0.5565
 5.  HAM     Ferrari                5.0 [0.5-9.5]     developing    0.4407
 6.  LEC     Ferrari                6.0 [2.0-10.0]    established   0.5180
 7.  LAW     Racing Bulls           7.0 [3.0-11.0]    established   0.4762
 8.  HAD     Racing Bulls           8.0 [4.0-12.0]    established   0.5179
 9.  DOO     Alpine                 9.0 [4.5-13.5]    developing    0.4675
10.  GAS     Alpine                10.0 [6.0-14.0]    established   

## Full Season Validation

Compare model-only vs blended across all races.

In [6]:
if actual_results:
    print("\nFULL SEASON VALIDATION: Model vs Blended")
    print("="*70)
    
    model_only_results = []
    blended_results = []
    
    for race_name, race_data in actual_results['races'].items():
        if race_name not in all_tracks:
            continue
        
        track = all_tracks[race_name]
        weekend_type = race_data['weekend_type']
        
        # Model prediction
        if weekend_type == 'sprint':
            session = 'post_sprint_quali'
        else:
            session = 'post_fp3'
        
        team_rankings = rank_teams_for_track(all_cars, track, session, weekend_type)
        if not team_rankings:
            continue
        
        model_ranks = {team: rank for rank, (team, _, _, _) in enumerate(team_rankings, 1)}
        
        # Blended prediction
        blend_result = predict_with_blending(
            SEASON,
            race_name,
            model_ranks,
            weekend_type=weekend_type,
            weight_fp=FP_WEIGHT
        )
        
        # Get lineups and predict drivers
        lineups = get_lineups(SEASON, race_name)
        
        # Model-only driver predictions
        model_drivers = driver_ranker.predict_positions(
            team_predictions=model_ranks,
            team_lineups=lineups,
            session_type='qualifying'
        )
        
        # Blended driver predictions
        blend_drivers = driver_ranker.predict_positions(
            team_predictions=blend_result['blended_ranks'],
            team_lineups=lineups,
            session_type='qualifying'
        )
        
        # Calculate errors
        model_errors = []
        blend_errors = []
        
        for pred in model_drivers['predictions']:
            actual_pos = next(
                (p['position'] for p in race_data['positions'] if p['driver'] == pred.driver),
                None
            )
            if actual_pos:
                model_errors.append(abs(pred.position - actual_pos))
        
        for pred in blend_drivers['predictions']:
            actual_pos = next(
                (p['position'] for p in race_data['positions'] if p['driver'] == pred.driver),
                None
            )
            if actual_pos:
                blend_errors.append(abs(pred.position - actual_pos))
        
        if model_errors and blend_errors:
            model_mae = np.mean(model_errors)
            blend_mae = np.mean(blend_errors)
            
            improvement = model_mae - blend_mae
            
            model_only_results.append({
                'race': race_name,
                'mae': model_mae,
                'weekend_type': weekend_type
            })
            
            blended_results.append({
                'race': race_name,
                'mae': blend_mae,
                'improvement': improvement,
                'blend_type': blend_result['blend_type'],
                'weekend_type': weekend_type
            })
            
            status = "üü¢" if improvement > 0 else "üî¥"
            print(f"{status} {race_name:<30} Model: {model_mae:.2f}  Blended: {blend_mae:.2f}  Œî: {improvement:+.2f}")
    
    # Overall summary
    if model_only_results and blended_results:
        print(f"\n{'='*70}")
        print("OVERALL COMPARISON")
        print(f"{'='*70}")
        
        overall_model_mae = np.mean([r['mae'] for r in model_only_results])
        overall_blend_mae = np.mean([r['mae'] for r in blended_results])
        overall_improvement = overall_model_mae - overall_blend_mae
        
        print(f"\nAcross {len(blended_results)} races:")
        print(f"  Model-only MAE:  {overall_model_mae:.2f} positions")
        print(f"  Blended MAE:     {overall_blend_mae:.2f} positions")
        print(f"  Improvement:     {overall_improvement:+.2f} positions ({overall_improvement/overall_model_mae*100:+.1f}%)")
        
        # Check how many races improved
        improved = sum(1 for r in blended_results if r['improvement'] > 0)
        print(f"\nRaces improved: {improved}/{len(blended_results)} ({improved/len(blended_results)*100:.1f}%)")
        
        # By weekend type
        normal_blend = [r for r in blended_results if r['weekend_type'] == 'normal']
        sprint_blend = [r for r in blended_results if r['weekend_type'] == 'sprint']
        
        if normal_blend:
            print(f"\nNormal weekends ({len(normal_blend)} races):")
            print(f"  Average improvement: {np.mean([r['improvement'] for r in normal_blend]):+.2f} positions")
        
        if sprint_blend:
            print(f"\nSprint weekends ({len(sprint_blend)} races):")
            print(f"  Average improvement: {np.mean([r['improvement'] for r in sprint_blend]):+.2f} positions")
else:
    print("No actual results available for validation")


FULL SEASON VALIDATION: Model vs Blended
üü¢ Australian Grand Prix          Model: 3.58  Blended: 3.20  Œî: +0.38
üü¢ Chinese Grand Prix             Model: 3.02  Blended: 2.72  Œî: +0.30
üü¢ Japanese Grand Prix            Model: 3.99  Blended: 3.41  Œî: +0.58
üü¢ Bahrain Grand Prix             Model: 3.58  Blended: 3.48  Œî: +0.10
üü¢ Saudi Arabian Grand Prix       Model: 3.14  Blended: 2.81  Œî: +0.34
üü¢ Miami Grand Prix               Model: 4.02  Blended: 3.02  Œî: +1.00
üü¢ Emilia Romagna Grand Prix      Model: 4.50  Blended: 4.00  Œî: +0.50
üü¢ Monaco Grand Prix              Model: 4.59  Blended: 3.39  Œî: +1.20
üü¢ Spanish Grand Prix             Model: 4.20  Blended: 3.05  Œî: +1.15
üî¥ Canadian Grand Prix            Model: 3.65  Blended: 4.30  Œî: -0.65
üü¢ Austrian Grand Prix            Model: 4.59  Blended: 4.19  Œî: +0.40
üü¢ British Grand Prix             Model: 3.99  Blended: 3.16  Œî: +0.83
üü¢ Belgian Grand Prix             Model: 5.75  Blended: 3.95  Œî: +1

## Analysis: When Does Blending Help Most?

Check which tracks benefit most from FP data.

In [7]:
if blended_results:
    print("\nTRACKS WHERE BLENDING HELPS MOST")
    print("="*70)
    
    # Sort by improvement
    sorted_results = sorted(blended_results, key=lambda x: x['improvement'], reverse=True)
    
    print(f"\n{'Track':<30} {'Improvement':<15} {'Blend Type':<15}")
    print("-"*70)
    
    for result in sorted_results[:10]:
        print(f"{result['race']:<30} {result['improvement']:+.2f} positions   {result['blend_type']:<15}")


TRACKS WHERE BLENDING HELPS MOST

Track                          Improvement     Blend Type     
----------------------------------------------------------------------
Italian Grand Prix             +2.20 positions   fp_blended     
S√£o Paulo Grand Prix           +2.13 positions   fp_blended     
Belgian Grand Prix             +1.80 positions   fp_blended     
Qatar Grand Prix               +1.60 positions   fp_blended     
Monaco Grand Prix              +1.20 positions   fp_blended     
Spanish Grand Prix             +1.15 positions   fp_blended     
Hungarian Grand Prix           +1.15 positions   fp_blended     
Mexico City Grand Prix         +1.11 positions   fp_blended     
Miami Grand Prix               +1.00 positions   fp_blended     
British Grand Prix             +0.83 positions   fp_blended     
