# Predicting F1 Qualification and Race Outcomes Using Practice Session Data

# Introduction
Formula 1 is a sport where milliseconds matter, and predictive insights can be the difference between victory and defeat. This project leverages data from Free Practice sessions (FP1, FP2, and FP3) to predict the outcome of race day—specifically, the eventual race winner. From a motorsport strategy perspective, we analyze key performance indicators such as lap times, tire compound choices, stint durations, and tire degradation rates to assess how teams and drivers prepare for qualifying and race simulations.

On the data science front, we build a structured pipeline using the FastF1 library to extract, process, and engineer features from historical Grand Prix data. Machine learning models—including classification and regression techniques—are employed to correlate practice performance with final race outcomes. By combining domain-specific race knowledge with predictive analytics, this project aims to anticipate on-track results before the lights go out.

### Importing Libraries

In [21]:
import fastf1
from fastf1.ergast import Ergast
from fastf1 import plotting
import pandas as pd
import numpy as np
from collections import defaultdict
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression, LinearRegression
from xgboost import XGBClassifier
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.metrics import classification_report, roc_auc_score
from sklearn.decomposition import PCA
import shap
import streamlit as st
import warnings

warnings.filterwarnings('ignore')
fastf1.Cache.enable_cache('./fastf1_cache')

# Load Data using legacy Ergast fallback to avoid SSL verification issues
# import urllib3
# urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
# import fastf1.ergast.interface as ergast_interface

## Load and Extracxt Data

#### Load Session Data

In [7]:
# Load Session Data
def load_session_data(season, gp, session):
    ses = fastf1.get_session(season, gp, session)
    ses.load()
    return ses

#### Extracting Practice Data

In [40]:
# Extracting Practice Session Stats
def extract_practice_data(seasons):
    sessions=['FP1', 'FP2', 'FP3']
    data = []
    for season in seasons:
        schedule = fastf1.get_event_schedule(season)
        for _, row in schedule.iterrows():
            race = row['EventName']
            driver_data = defaultdict(lambda: {
                'LapTimes': [],
                'Driver': None
            })
            for i, session in enumerate(sessions):
                try:
                    ses = fastf1.get_session(season, race, session)
                    ses.load()

                    for driver in ses.drivers:
                        driver_laps = ses.laps.pick_driver(driver).copy()
                        driver_laps = driver_laps[driver_laps['LapTime'].notna()]
                        
                        if driver_laps.empty:
                            continue
                        lap_times = driver_laps['LapTime'].dt.total_seconds().tolist()
    
                        driver_data[driver]['LapTimes'].extend(lap_times)
                        driver_data[driver]['Driver'] = ses.get_driver(driver)['Abbreviation'] 
                except Exception as e:
                    print(f"Skipped {season} {race} {session} due to: {e}")
                    continue
                    



            for driver, stats in driver_data.items():
                if not stats['LapTimes']:
                    continue
                fastest_lap = min(stats['LapTimes'])
                avg_lap = sum(stats['LapTimes'])/len(stats['LapTimes'])
                data.append({
                    'Season': season,
                    'Race': race,
                    'Driver': stats['Driver'],
                    f'FastestPracticeLap': fastest_lap,
                    f'AvgPracticeLap': avg_lap,
                    # **tire_compound_counts # e.g. 'Soft': 6, 'Medium': 3
                })

    return pd.DataFrame(data)

In [41]:
# def get_practice_data(season, gp_name):
#     sessions = ['FP1', 'FP2', 'FP3']
#     df = pd.DataFrame()
#     for i, sess in enumerate(sessions):
#         if df.empty:
#             df = extract_practice_features(season, gp_name, sess)
#             continue
#         drivers = extract_practice_features(season, gp_name, sess)
#         df = df.merge(drivers, on='Driver', how='left')

#     return df

In [42]:
df =  extract_practice_data([2021])

core           INFO 	Loading data for United States Grand Prix - Practice 1 [v3.5.3]
req            INFO 	Using cached data for session_info
req            INFO 	Using cached data for driver_info
req            INFO 	Using cached data for session_status_data
req            INFO 	Using cached data for track_status_data
req            INFO 	Using cached data for _extended_timing_data
req            INFO 	Using cached data for timing_app_data
core           INFO 	Processing timing data...
req            INFO 	Using cached data for car_data
req            INFO 	Using cached data for position_data
req            INFO 	Using cached data for weather_data
req            INFO 	Using cached data for race_control_messages
core           INFO 	Finished loading data for 20 drivers: ['3', '4', '5', '6', '7', '9', '10', '11', '14', '16', '18', '22', '31', '33', '44', '47', '55', '63', '77', '99']
core           INFO 	Loading data for United States Grand Prix - Practice 2 [v3.5.3]
req            INFO 

Skipped 2021 British Grand Prix FP3 due to: Session type 'FP3' does not exist for this event


req            INFO 	Using cached data for car_data
req            INFO 	Using cached data for position_data
req            INFO 	Using cached data for weather_data
req            INFO 	Using cached data for race_control_messages
core           INFO 	Finished loading data for 20 drivers: ['3', '4', '5', '6', '9', '10', '11', '14', '16', '18', '22', '31', '33', '44', '47', '55', '63', '77', '88', '99']
core           INFO 	Loading data for Hungarian Grand Prix - Practice 2 [v3.5.3]
req            INFO 	Using cached data for session_info
req            INFO 	Using cached data for driver_info
req            INFO 	Using cached data for session_status_data
req            INFO 	Using cached data for track_status_data
req            INFO 	Using cached data for _extended_timing_data
req            INFO 	Using cached data for timing_app_data
core           INFO 	Processing timing data...
req            INFO 	Using cached data for car_data
req            INFO 	Using cached data for position_data

Skipped 2021 Italian Grand Prix FP3 due to: Session type 'FP3' does not exist for this event


req            INFO 	Using cached data for car_data
req            INFO 	Using cached data for position_data
req            INFO 	Using cached data for weather_data
req            INFO 	Using cached data for race_control_messages
core           INFO 	Finished loading data for 20 drivers: ['3', '4', '5', '6', '7', '9', '10', '11', '14', '16', '18', '22', '31', '33', '44', '47', '55', '63', '77', '99']
core           INFO 	Loading data for Russian Grand Prix - Practice 2 [v3.5.3]
req            INFO 	Using cached data for session_info
req            INFO 	Using cached data for driver_info
req            INFO 	Using cached data for session_status_data
req            INFO 	Using cached data for track_status_data
req            INFO 	Using cached data for _extended_timing_data
req            INFO 	Using cached data for timing_app_data
core           INFO 	Processing timing data...
req            INFO 	Using cached data for car_data
req            INFO 	Using cached data for position_data
re

DataNotLoadedError: The data you are trying to access has not been loaded yet. See `Session.load`

#### Extract tire degradation features

In [7]:
# Extract tire degradation features
def get_tire_degradation_features(session):
    laps = session.laps
    laps = laps.pick_accurate()
    drivers = session.drivers
    deg_data = []

    for driver in drivers:
        driver_laps = laps.pick_driver(driver).pick_wo_box()
        if driver_laps.empty:
            continue

        for compound in driver_laps['Compound'].unique():
            comp_laps = driver_laps[driver_laps['Compound'] == compound]

            for stint in comp_laps['Stint'].unique():
                stint_laps = comp_laps[comp_laps['Stint'] ==  stint]
                stint_laps = stint_laps[stint_laps['LapTime'].notna()]
                if len(stint_laps) < 4:
                    continue

                trimmed = stint_laps.iloc[1:-1]
                if len(trimmed) < 3:
                    continue
                
                lap_times = trimmed['LapTime'].dt.total_seconds()
                lap_nums = trimmed['LapNumber']
                
                # Regression for degradation rate
                X = lap_nums.values.reshape(-1, 1)
                y = lap_times.values
                slope = LinearRegression().fit(X, y).coef_[0]

                # Degradation Rate
                deg_rate = max(slope, 0)
    
                deg_data.append({
                    'Driver': session.get_driver(driver)['Abbreviation'],
                    'Compound': compound,
                    'DegRate': deg_rate
                })
            
    return pd.DataFrame(deg_data)

In [8]:
tire_deg = get_tire_degradation_features(session)
tire_deg

Unnamed: 0,Driver,Compound,DegRate
0,VER,HARD,0.77575
1,VER,HARD,0.0
2,SAR,MEDIUM,0.371631
3,SAR,SOFT,0.482
4,RIC,HARD,2.54752
5,RIC,MEDIUM,1.225607
6,RIC,MEDIUM,0.0
7,NOR,MEDIUM,2.490657
8,NOR,MEDIUM,0.50355
9,NOR,SOFT,1.8862


In [9]:
# --- Summarize per compound ---
def summarize_degradation(deg_df):
    if deg_df.empty:
        return pd.DataFrame()

    # Pivot into columns for Soft/Medium/Hard
    pivot = deg_df.pivot_table(index='Driver', columns='Compound', values='DegRate', aggfunc='mean')
    pivot = pivot.rename(columns=lambda x: f'Deg_{x}').reset_index()

    for comp in ['Soft', 'Medium', 'Hard']:
        col = f'deg_{comp}'
        
    return pivot

In [10]:
summ_deg = summarize_degradation(tire_deg)
summ_deg

Compound,Driver,Deg_HARD,Deg_MEDIUM,Deg_SOFT
0,ALB,,0.513319,1.8312
1,ALO,,0.710963,2.953
2,BOT,,0.062168,
3,GAS,,0.712368,0.0
4,HAM,,1.082667,0.0
5,HUL,,3.732668,0.61975
6,LEC,,1.048262,0.0
7,MAG,,4.214171,0.096057
8,NOR,,1.497104,1.8862
9,OCO,,0.200661,0.362612


#### Extracting Qualifying Data

In [79]:
seasons = [2022]
def get_qualifying_data(seasons):
    sessions = ['Q']
    for season in seasons:
        schedule = fastf1.get_event_schedule(season)
        for _, row in schedule.iterrows():
            race = row['EventName']
            for session in sessions:
                try:
                    quali = fastf1.get_session(season, race, session)
                    quali.load()
                
                    results = quali.results
                    # results['Q1Time'] = results['Q1'].dt.total_seconds()
                    # results['Q2Time'] = results['Q2'].dt.total_seconds()
                    # results['Q3Time'] = results['Q3'].dt.total_seconds()
                    # results['Driver'] = results['Abbreviation']
                    # results['GridPos'] = results['Position'].astype('int')
                except Exception as e:
                    print(f"Skipped {season} {race} {session} due to: {e}")
    
        
    # return results[['Driver', 'GridPosition', 'Q1Time', 'Q2Time', 'Q3Time']]
    return results

In [80]:
quali_df = get_qualifying_data(seasons)

core           INFO 	Loading data for British Grand Prix - Qualifying [v3.5.3]
req            INFO 	No cached data found for session_info. Loading data...
_api           INFO 	Fetching session info data...
req            INFO 	Data has been written to cache!
req            INFO 	No cached data found for driver_info. Loading data...
_api           INFO 	Fetching driver list...
req            INFO 	Data has been written to cache!
req            INFO 	No cached data found for session_status_data. Loading data...
_api           INFO 	Fetching session status data...
req            INFO 	Data has been written to cache!
req            INFO 	No cached data found for track_status_data. Loading data...
_api           INFO 	Fetching track status data...
req            INFO 	Data has been written to cache!
req            INFO 	No cached data found for _extended_timing_data. Loading data...
_api           INFO 	Fetching timing data...
_api           INFO 	Parsing timing data...
req            INFO 

In [72]:
quali_df

Unnamed: 0,DriverNumber,BroadcastName,Abbreviation,DriverId,TeamName,TeamColor,TeamId,FirstName,LastName,FullName,...,CountryCode,Position,ClassifiedPosition,GridPosition,Q1,Q2,Q3,Time,Status,Points
44,44,L HAMILTON,HAM,hamilton,Mercedes,00D2BE,mercedes,Lewis,Hamilton,Lewis Hamilton,...,,1.0,,,0 days 00:01:14.823000,0 days 00:01:14.817000,0 days 00:01:14.411000,NaT,,
11,11,S PEREZ,PER,perez,Red Bull Racing,0600EF,red_bull,Sergio,Perez,Sergio Perez,...,,2.0,,,0 days 00:01:15.395000,0 days 00:01:14.716000,0 days 00:01:14.446000,NaT,,
33,33,M VERSTAPPEN,VER,max_verstappen,Red Bull Racing,0600EF,red_bull,Max,Verstappen,Max Verstappen,...,,3.0,,,0 days 00:01:15.109000,0 days 00:01:14.884000,0 days 00:01:14.498000,NaT,,
16,16,C LECLERC,LEC,leclerc,Ferrari,DC0004,ferrari,Charles,Leclerc,Charles Leclerc,...,,4.0,,,0 days 00:01:15.413000,0 days 00:01:14.808000,0 days 00:01:14.740000,NaT,,
10,10,P GASLY,GAS,gasly,AlphaTauri,2B4562,alphatauri,Pierre,Gasly,Pierre Gasly,...,,5.0,,,0 days 00:01:15.548000,0 days 00:01:14.927000,0 days 00:01:14.790000,NaT,,
3,3,D RICCIARDO,RIC,ricciardo,McLaren,FF9800,mclaren,Daniel,Ricciardo,Daniel Ricciardo,...,,6.0,,,0 days 00:01:15.669000,0 days 00:01:15.033000,0 days 00:01:14.826000,NaT,,
4,4,L NORRIS,NOR,norris,McLaren,FF9800,mclaren,Lando,Norris,Lando Norris,...,,7.0,,,0 days 00:01:15.009000,0 days 00:01:14.718000,0 days 00:01:14.875000,NaT,,
77,77,V BOTTAS,BOT,bottas,Mercedes,00D2BE,mercedes,Valtteri,Bottas,Valtteri Bottas,...,,8.0,,,0 days 00:01:14.672000,0 days 00:01:14.905000,0 days 00:01:14.898000,NaT,,
31,31,E OCON,OCO,ocon,Alpine,0090FF,alpine,Esteban,Ocon,Esteban Ocon,...,,9.0,,,0 days 00:01:15.385000,0 days 00:01:15.117000,0 days 00:01:15.210000,NaT,,
18,18,L STROLL,STR,stroll,Aston Martin,006F62,aston_martin,Lance,Stroll,Lance Stroll,...,,10.0,,,0 days 00:01:15.522000,0 days 00:01:15.138000,NaT,NaT,,


#### Extracting Race Result Label Data

In [56]:
def get_race_labels(seasons):
    for season in seasons:
        schedule = fastf1.get_event_schedule(season)
        for _, row in schedule.iterrows():
            race = row['EventName']
            try:
                race = fastf1.get_session(season, race, 'R')
                race.load()
            
                results = race.results[['Abbreviation', 'Position']].rename(
                    columns={'Abbreviation': 'Driver', 'Position': 'RaceResult'}
                )
            except Exception as e:
                    print(f"Skipped {season} {race} {session} due to: {e}")
        
    return results[['Driver', 'RaceResult']]

get_race_labels(seasons)

#### Merge All Features + Race Result

In [57]:
def build_features(season, gp_name):
    # session = fastf1.get_session(season, gp_name, 'FP2')
    # session.load()

    pace_data = get_practice_data(season, gp_name)
    quali_data =  get_qualifying_data(season, gp_name)
    race_labels = get_race_labels(season, gp_name)

    # Merging the data
    df = pace_data.merge(quali_data, on='Driver', how='left')
    df = df.merge(race_labels, on='Driver', how='left')

    return df

features_df = build_features(2024, 'Bahrain')

core           INFO 	Loading data for Bahrain Grand Prix - Practice 1 [v3.5.3]
req            INFO 	Using cached data for session_info
req            INFO 	Using cached data for driver_info
req            INFO 	Using cached data for session_status_data
req            INFO 	Using cached data for track_status_data
req            INFO 	Using cached data for _extended_timing_data
req            INFO 	Using cached data for timing_app_data
core           INFO 	Processing timing data...
req            INFO 	Using cached data for car_data
req            INFO 	Using cached data for position_data
req            INFO 	Using cached data for weather_data
req            INFO 	Using cached data for race_control_messages
core           INFO 	Finished loading data for 20 drivers: ['1', '2', '3', '4', '10', '11', '14', '16', '18', '20', '22', '23', '24', '27', '31', '44', '55', '63', '77', '81']
core           INFO 	Loading data for Bahrain Grand Prix - Practice 2 [v3.5.3]
req            INFO 	Using cac

In [59]:
features_df

Unnamed: 0,Driver,FP1_FastestLap,FP1_AvgLapTime,FP1_NumLaps,FP2_FastestLap,FP2_AvgLapTime,FP2_NumLaps,FP3_FastestLap,FP3_AvgLapTime,FP3_NumLaps,GridPosition,Q1Time,Q2Time,Q3Time,RaceResult
0,VER,93.238,96.409777,9,90.851,94.751111,9,91.062,92.14225,4,,90.031,89.374,89.179,1.0
1,SAR,94.213,95.906111,9,91.715,92.889833,6,92.125,92.732,4,,90.77,,,20.0
2,RIC,92.869,96.718181,11,91.516,92.31775,4,91.449,92.1105,2,,90.562,90.278,,13.0
3,NOR,92.901,96.52309,11,92.608,92.754,2,91.118,92.163333,3,,90.143,89.941,89.614,6.0
4,GAS,95.144,95.739166,6,91.951,92.30425,4,92.382,93.0265,4,,90.948,,,18.0
5,PER,93.413,96.35709,11,91.115,93.7515,6,91.248,92.328,4,,90.221,89.932,89.537,2.0
6,ALO,93.193,93.8975,6,90.66,91.209666,3,90.965,92.4336,5,,90.179,89.801,89.542,9.0
7,LEC,93.268,96.3019,10,91.113,93.4254,5,91.094,92.518666,6,,90.243,89.165,89.407,4.0
8,STR,93.868,94.804571,7,90.891,91.6885,4,91.396,92.1826,5,,89.965,90.2,,10.0
9,MAG,97.477,98.346437,16,91.764,92.10925,4,91.671,94.61,8,,90.646,90.529,,12.0
