# 01 - Ingest FastF1 Data

## Overview
Fetch F1 race data from FastF1 API and save to raw directory.

## Inputs
- Target races list

## Outputs
- data/raw/sessions.csv
- data/raw/{session_key}_laps.parquet
- data/raw/{session_key}_pitstops.csv
- data/raw/{session_key}_weather.csv

In [5]:
import sys
from pathlib import Path
sys.path.insert(0, str(Path.cwd().parent / 'src'))

import pandas as pd
import numpy as np
from f1ts import config, io_flat, ingest, validation

config.ensure_dirs()
print(f"Working with project root: {config.PROJECT_ROOT}")

Working with project root: /workspaces/f1-track-strategy


## Load: Define Target Races

In [6]:
# Define races to ingest
# Format: (season, round_number)
TARGET_RACES = [
    (2023, 1),  # Bahrain
    (2023, 2),  # Saudi Arabia
    (2023, 3),  # Australia
]

print(f"Target races: {len(TARGET_RACES)}")
for season, round_num in TARGET_RACES:
    print(f"  - {season} Round {round_num}")

Target races: 3
  - 2023 Round 1
  - 2023 Round 2
  - 2023 Round 3


## Transform: Fetch and Save Races

In [7]:
# Fetch and save each race
ingest.fetch_and_save_races(TARGET_RACES, session_code='R')

print("\n✓ All races fetched and saved")

Fetching races:   0%|          | 0/3 [00:00<?, ?it/s]

Fetching 2023_1_R...


core           INFO 	Loading data for Bahrain Grand Prix - Race [v3.2.0]
req            INFO 	Using cached data for session_info
req            INFO 	Using cached data for driver_info
req            INFO 	Using cached data for session_status_data
req            INFO 	Using cached data for lap_count
req            INFO 	Using cached data for track_status_data
req            INFO 	Using cached data for _extended_timing_data
req            INFO 	Using cached data for timing_app_data
core           INFO 	Processing timing data...
req            INFO 	Using cached data for car_data
req            INFO 	Using cached data for position_data
req            INFO 	Using cached data for weather_data
req            INFO 	Using cached data for race_control_messages
core           INFO 	Finished loading data for 20 drivers: ['1', '11', '16', '55', '14', '63', '44', '18', '31', '27', '4', '77', '24', '22', '23', '2', '20', '81', '21', '10']
Fetching races:  33%|███▎      | 1/3 [00:03<00:06,  3.16s/it]

✓ Saved session info to sessions.csv
✓ Saved 1,035 laps to 2023_1_R_laps.parquet
✓ Saved 161 weather records to 2023_1_R_weather.csv
✓ Completed 2023_1_R

Fetching 2023_2_R...


req            INFO 	Using cached data for car_data
req            INFO 	Using cached data for position_data
req            INFO 	Using cached data for weather_data
req            INFO 	Using cached data for race_control_messages
core           INFO 	Finished loading data for 20 drivers: ['11', '14', '63', '55', '18', '31', '44', '81', '10', '27', '24', '16', '20', '77', '1', '22', '23', '21', '4', '2']
Fetching races:  67%|██████▋   | 2/3 [00:05<00:02,  2.86s/it]core           INFO 	Loading data for Australian Grand Prix - Race [v3.2.0]
req            INFO 	Using cached data for session_info
req            INFO 	Using cached data for driver_info
req            INFO 	Using cached data for session_status_data
req            INFO 	Using cached data for lap_count
req            INFO 	Using cached data for track_status_data
req            INFO 	Using cached data for _extended_timing_data
req            INFO 	Using cached data for timing_app_data
core           INFO 	Processing timing data.

✓ Saved session info to sessions.csv
✓ Saved 904 laps to 2023_2_R_laps.parquet
✓ Saved 148 weather records to 2023_2_R_weather.csv
✓ Completed 2023_2_R

Fetching 2023_3_R...


req            INFO 	Using cached data for car_data
req            INFO 	Using cached data for position_data
req            INFO 	Using cached data for weather_data
req            INFO 	Using cached data for race_control_messages
core           INFO 	Finished loading data for 20 drivers: ['1', '63', '44', '14', '55', '18', '16', '23', '10', '27', '31', '22', '4', '20', '21', '81', '24', '2', '77', '11']
Fetching races: 100%|██████████| 3/3 [00:09<00:00,  3.13s/it]

✓ Saved session info to sessions.csv
✓ Saved 882 laps to 2023_3_R_laps.parquet
✓ Saved 14 pit stops to 2023_3_R_pitstops.csv
✓ Saved 222 weather records to 2023_3_R_weather.csv
✓ Completed 2023_3_R


✓ All races fetched and saved





## Validate: Check Data Quality

In [8]:
# Load sessions file
sessions_file = config.paths()['data_raw'] / 'sessions.csv'
sessions = pd.read_csv(sessions_file)

print(f"\nSessions loaded: {len(sessions)}")
print(sessions[['session_key', 'circuit_name', 'date']])

# Validate each race
for _, session in sessions.iterrows():
    session_key = session['session_key']
    
    # Check laps file
    laps_file = config.paths()['data_raw'] / f"{session_key}_laps.parquet"
    if laps_file.exists():
        laps = pd.read_parquet(laps_file)
        
        # Validate minimum lap count
        n_laps = len(laps)
        expected_min_laps = config.MIN_LAPS_PER_RACE
        
        status = "✓" if n_laps >= expected_min_laps else "✗"
        print(f"\n{status} {session_key}: {n_laps} laps (min: {expected_min_laps})")
        
        # Check uniqueness
        validation.validate_uniqueness(laps, ['session_key', 'driver', 'lap'], session_key)
        
        # Check for required columns
        required_cols = ['session_key', 'driver', 'lap', 'lap_time_ms', 'compound']
        validation.validate_schema(laps, required_cols, name=session_key)
    else:
        print(f"✗ {session_key}: Laps file not found")

print("\n✓ Validation complete")


Sessions loaded: 3
  session_key              circuit_name                 date
0    2023_1_R        Bahrain Grand Prix  2023-03-05 00:00:00
1    2023_2_R  Saudi Arabian Grand Prix  2023-03-19 00:00:00
2    2023_3_R     Australian Grand Prix  2023-04-02 00:00:00

✓ 2023_1_R: 1035 laps (min: 50)
✓ Uniqueness validation passed for 2023_1_R on ['session_key', 'driver', 'lap']
✓ Schema validation passed for 2023_1_R

✓ 2023_2_R: 904 laps (min: 50)
✓ Uniqueness validation passed for 2023_2_R on ['session_key', 'driver', 'lap']
✓ Schema validation passed for 2023_2_R

✓ 2023_3_R: 882 laps (min: 50)
✓ Uniqueness validation passed for 2023_3_R on ['session_key', 'driver', 'lap']
✓ Schema validation passed for 2023_3_R

✓ Validation complete


## Save: Already saved during ingestion

## Repro Notes

- Fetched races using FastF1 API
- Data saved to data/raw/
- All races meet minimum lap count threshold
- Uniqueness and schema validations passed