# Validation Metrics - MAE, RMSE, Top-N Accuracy

Comprehensive validation of prediction accuracy using multiple metrics.

**STATUS: Template ready - awaiting 2026 race data for validation**

Once Race 3-5 complete, populate validation_data below with:
- Predicted positions (from dashboard before race)
- Actual results (from FastF1 after race)

## Metrics Tracked:
1. **Mean Absolute Error (MAE)** - Average position error (target: < 2.5)
2. **Root Mean Square Error (RMSE)** - Penalizes large errors (target: < 3.5)
3. **Top 3 Accuracy** - % correct podium predictions (target: > 60%)
4. **Top 10 Accuracy** - % correct points finishers (target: > 70%)
5. **Position Bucket Accuracy** - P1-5, P6-10, P11-15, P16-20 (target: > 60%)

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.metrics import mean_absolute_error, mean_squared_error
import fastf1

# Enable FastF1 cache
fastf1.Cache.enable_cache('data/raw/.fastf1_cache')

## Load Predictions vs Actual Results

**TODO:** Replace placeholder data with actual predictions from 2026 races

In [None]:
# PLACEHOLDER: To be filled with actual 2026 validation data
# Format: {'race': 'Race Name', 'driver': 'DRV', 'predicted': int, 'actual': int}
validation_data = [
    # Example structure - replace with real data:
    # {'race': 'Bahrain Grand Prix', 'driver': 'VER', 'predicted': 1, 'actual': 1},
    # {'race': 'Bahrain Grand Prix', 'driver': 'NOR', 'predicted': 2, 'actual': 3},
    # ...
]

if not validation_data:
    print("‚ö†Ô∏è No validation data loaded. Run predictions for 2026 races first.")
    print("\nTo collect validation data:")
    print("1. Generate predictions before each race via dashboard")
    print("2. Save predicted positions")
    print("3. After race, fetch actual results via FastF1")
    print("4. Populate validation_data list above")
else:
    df = pd.DataFrame(validation_data)
    print(f"‚úÖ Loaded {len(df)} predictions across {df['race'].nunique()} races")
    print(f"\nRaces included: {', '.join(df['race'].unique())}")
    df.head()

## Metric 1: Mean Absolute Error (MAE)

Average absolute difference between predicted and actual positions.

In [None]:
if validation_data:
    mae = mean_absolute_error(df['actual'], df['predicted'])
    print(f"Mean Absolute Error: {mae:.2f} positions")
    print(f"Target: < 2.5 positions")
    print(f"Status: {'‚úÖ PASS' if mae < 2.5 else '‚ùå FAIL'}")

    # MAE by race
    mae_by_race = df.groupby('race').apply(lambda x: mean_absolute_error(x['actual'], x['predicted']))
    print("\nMAE by Race:")
    print(mae_by_race.sort_values())

    # Plot
    plt.figure(figsize=(12, 4))
    mae_by_race.plot(kind='bar', color='steelblue')
    plt.axhline(mae, color='r', linestyle='--', label=f'Overall MAE: {mae:.2f}')
    plt.axhline(2.5, color='g', linestyle='--', alpha=0.5, label='Target: 2.5')
    plt.title('Mean Absolute Error by Race')
    plt.ylabel('MAE (positions)')
    plt.xlabel('Race')
    plt.xticks(rotation=45, ha='right')
    plt.legend()
    plt.grid(axis='y', alpha=0.3)
    plt.tight_layout()
    plt.show()
else:
    print("‚è∏Ô∏è Awaiting validation data")

## Metric 2: Root Mean Square Error (RMSE)

Penalizes large prediction errors more heavily than MAE.

In [None]:
if validation_data:
    rmse = np.sqrt(mean_squared_error(df['actual'], df['predicted']))
    print(f"Root Mean Square Error: {rmse:.2f} positions")
    print(f"Target: < 3.5 positions")
    print(f"Status: {'‚úÖ PASS' if rmse < 3.5 else '‚ùå FAIL'}")
    print(f"\nRMSE/MAE ratio: {rmse/mae:.2f}x")
    print("(Higher ratio indicates more large errors)")

    # Error distribution
    df['error'] = df['predicted'] - df['actual']
    plt.figure(figsize=(10, 5))
    plt.hist(df['error'], bins=range(-10, 11), edgecolor='black', alpha=0.7, color='coral')
    plt.axvline(0, color='r', linestyle='--', linewidth=2, label='Perfect prediction')
    plt.title('Prediction Error Distribution')
    plt.xlabel('Error (predicted - actual positions)')
    plt.ylabel('Frequency')
    plt.legend()
    plt.grid(axis='y', alpha=0.3)
    plt.tight_layout()
    plt.show()

    print(f"\nError statistics:")
    print(f"  Mean error (bias): {df['error'].mean():.2f}")
    print(f"  Std dev: {df['error'].std():.2f}")
    print(f"  Median error: {df['error'].median():.1f}")
else:
    print("‚è∏Ô∏è Awaiting validation data")

## Metric 3: Top-N Accuracy

Percentage of drivers correctly predicted in top N positions.

In [None]:
if validation_data:
    def top_n_accuracy(df, n):
        """Calculate % of drivers predicted in top N who actually finished in top N."""
        predicted_top_n = df[df['predicted'] <= n]['driver'].values
        actual_top_n = df[df['actual'] <= n]['driver'].values
        correct = len(set(predicted_top_n) & set(actual_top_n))
        return correct / n * 100

    # Calculate for different N
    top_n_results = {}
    targets = {'Top 1': 40, 'Top 3': 60, 'Top 5': 65, 'Top 10': 70}
    
    for n in [1, 3, 5, 10]:
        accuracy = df.groupby('race').apply(lambda x: top_n_accuracy(x, n)).mean()
        top_n_results[f'Top {n}'] = accuracy
        target = targets[f'Top {n}']
        status = '‚úÖ' if accuracy >= target else '‚ùå'
        print(f"Top {n} Accuracy: {accuracy:.1f}% (target: >{target}%) {status}")

    # Plot
    plt.figure(figsize=(8, 5))
    bars = plt.bar(top_n_results.keys(), top_n_results.values(), color='mediumseagreen')
    plt.title('Top-N Prediction Accuracy')
    plt.ylabel('Accuracy (%)')
    plt.xlabel('Prediction Category')
    plt.ylim(0, 100)
    
    for i, (k, v) in enumerate(top_n_results.items()):
        plt.text(i, v + 2, f'{v:.1f}%', ha='center', fontweight='bold')
    
    plt.grid(axis='y', alpha=0.3)
    plt.tight_layout()
    plt.show()
else:
    print("‚è∏Ô∏è Awaiting validation data")

## Metric 4: Position Bucket Accuracy

How often drivers finish in the predicted position bucket (e.g., P1-5, P6-10).

In [None]:
if validation_data:
    def get_bucket(position):
        """Assign position to bucket."""
        if position <= 5:
            return 'P1-5'
        elif position <= 10:
            return 'P6-10'
        elif position <= 15:
            return 'P11-15'
        else:
            return 'P16-20'

    df['predicted_bucket'] = df['predicted'].apply(get_bucket)
    df['actual_bucket'] = df['actual'].apply(get_bucket)
    df['bucket_correct'] = df['predicted_bucket'] == df['actual_bucket']

    bucket_accuracy = df.groupby('actual_bucket')['bucket_correct'].mean() * 100
    print("Position Bucket Accuracy:")
    print(bucket_accuracy)

    plt.figure(figsize=(8, 5))
    bucket_accuracy.plot(kind='bar', color='slateblue')
    plt.title('Position Bucket Prediction Accuracy')
    plt.ylabel('Accuracy (%)')
    plt.xlabel('Position Bucket')
    plt.ylim(0, 100)
    plt.axhline(60, color='g', linestyle='--', alpha=0.5, label='Target: 60%')
    plt.xticks(rotation=0)
    plt.legend()
    plt.grid(axis='y', alpha=0.3)
    plt.tight_layout()
    plt.show()

    overall_bucket_acc = df['bucket_correct'].mean() * 100
    print(f"\nOverall bucket accuracy: {overall_bucket_acc:.1f}%")
    print(f"Target: > 60%")
    print(f"Status: {'‚úÖ PASS' if overall_bucket_acc > 60 else '‚ùå FAIL'}")
else:
    print("‚è∏Ô∏è Awaiting validation data")

## Summary Report

Consolidated view of all validation metrics.

In [None]:
if validation_data:
    def get_status(value, target, comparison='<'):
        if comparison == '<':
            return '‚úÖ PASS' if value < target else '‚ùå FAIL'
        else:
            return '‚úÖ PASS' if value >= target else '‚ùå FAIL'
    
    summary = {
        'Metric': ['MAE', 'RMSE', 'Top 1 Accuracy', 'Top 3 Accuracy', 'Top 10 Accuracy', 'Bucket Accuracy'],
        'Value': [
            f"{mae:.2f} pos", 
            f"{rmse:.2f} pos", 
            f"{top_n_results['Top 1']:.1f}%", 
            f"{top_n_results['Top 3']:.1f}%", 
            f"{top_n_results['Top 10']:.1f}%", 
            f"{overall_bucket_acc:.1f}%"
        ],
        'Target': ['< 2.5', '< 3.5', '>= 40%', '>= 60%', '>= 70%', '> 60%'],
        'Status': [
            get_status(mae, 2.5, '<'),
            get_status(rmse, 3.5, '<'),
            get_status(top_n_results['Top 1'], 40, '>='),
            get_status(top_n_results['Top 3'], 60, '>='),
            get_status(top_n_results['Top 10'], 70, '>='),
            get_status(overall_bucket_acc, 60, '>=')
        ]
    }

    df_summary = pd.DataFrame(summary)
    print("\n" + "="*70)
    print("VALIDATION METRICS SUMMARY")
    print("="*70)
    print(df_summary.to_string(index=False))
    print("="*70)
    
    pass_count = sum(1 for s in summary['Status'] if '‚úÖ' in s)
    total_count = len(summary['Status'])
    print(f"\n Overall: {pass_count}/{total_count} metrics passing")
    
    if pass_count == total_count:
        print("\nüéâ ALL METRICS PASSING - System validated!")
    elif pass_count >= total_count * 0.75:
        print("\n‚úÖ Most metrics passing - System performing well")
    else:
        print("\n‚ö†Ô∏è Multiple metrics failing - Review model parameters")
else:
    print("‚è∏Ô∏è Awaiting validation data from 2026 races")
    print("\nNext steps:")
    print("1. Run predictions for Races 1-3")
    print("2. Collect actual results")
    print("3. Populate validation_data above")
    print("4. Re-run this notebook")