# 01 - Ingest FastF1 Data

## Overview
Fetch F1 race data from FastF1 API and save to raw directory.

## Inputs
- Target races list

## Outputs
- data/raw/sessions.csv
- data/raw/{session_key}_laps.parquet
- data/raw/{session_key}_pitstops.csv
- data/raw/{session_key}_weather.csv

In [1]:
import sys
from pathlib import Path
sys.path.insert(0, str(Path.cwd().parent / 'src'))

import pandas as pd
import numpy as np
from f1ts import config, io_flat, ingest, validation

config.ensure_dirs()
print(f"Working with project root: {config.PROJECT_ROOT}")

Working with project root: /workspaces/f1-track-strategy


## Load: Define Target Races

In [2]:
# Define races to ingest
# Format: (season, round_number)
# Expanded to include rounds 1-10 for better model performance
TARGET_RACES = [
    (2023, 1),   # Bahrain
    (2023, 2),   # Saudi Arabia
    (2023, 3),   # Australia
    (2023, 4),   # Azerbaijan
    (2023, 5),   # Miami
    (2023, 6),   # Monaco
    (2023, 7),   # Spain
    (2023, 8),   # Canada
    (2023, 9),   # Austria
    (2023, 10),  # Britain
]

print(f"Target races: {len(TARGET_RACES)}")
for season, round_num in TARGET_RACES:
    print(f"  - {season} Round {round_num}")

Target races: 10
  - 2023 Round 1
  - 2023 Round 2
  - 2023 Round 3
  - 2023 Round 4
  - 2023 Round 5
  - 2023 Round 6
  - 2023 Round 7
  - 2023 Round 8
  - 2023 Round 9
  - 2023 Round 10


## Transform: Fetch and Save Races

In [3]:
# Fetch and save each race
ingest.fetch_and_save_races(TARGET_RACES, session_code='R')

print("\n✓ All races fetched and saved")

Fetching races:   0%|          | 0/10 [00:00<?, ?it/s]

Fetching 2023_1_R...


core           INFO 	Loading data for Bahrain Grand Prix - Race [v3.2.0]
req            INFO 	Using cached data for session_info
req            INFO 	Using cached data for driver_info
req            INFO 	Using cached data for session_status_data
req            INFO 	Using cached data for lap_count
req            INFO 	Using cached data for track_status_data
req            INFO 	Using cached data for _extended_timing_data
req            INFO 	Using cached data for timing_app_data
core           INFO 	Processing timing data...
req            INFO 	Using cached data for car_data
req            INFO 	Using cached data for position_data
req            INFO 	Using cached data for weather_data
req            INFO 	Using cached data for race_control_messages
core           INFO 	Finished loading data for 20 drivers: ['1', '11', '16', '55', '14', '63', '44', '18', '31', '27', '4', '77', '24', '22', '23', '2', '20', '81', '21', '10']
Fetching races:  10%|█         | 1/10 [00:05<00:52,  5.86s/it

✓ Saved session info to sessions.csv
✓ Saved 1,035 laps to 2023_1_R_laps.parquet
✓ Saved 161 weather records to 2023_1_R_weather.csv
✓ Completed 2023_1_R

Fetching 2023_2_R...


req            INFO 	Using cached data for car_data
req            INFO 	Using cached data for position_data
req            INFO 	Using cached data for weather_data
req            INFO 	Using cached data for race_control_messages
core           INFO 	Finished loading data for 20 drivers: ['11', '14', '63', '55', '18', '31', '44', '81', '10', '27', '24', '16', '20', '77', '1', '22', '23', '21', '4', '2']
Fetching races:  20%|██        | 2/10 [00:08<00:32,  4.02s/it]core           INFO 	Loading data for Australian Grand Prix - Race [v3.2.0]
req            INFO 	Using cached data for session_info
req            INFO 	Using cached data for driver_info
req            INFO 	Using cached data for session_status_data
req            INFO 	Using cached data for lap_count
req            INFO 	Using cached data for track_status_data
req            INFO 	Using cached data for _extended_timing_data
req            INFO 	Using cached data for timing_app_data
core           INFO 	Processing timing data

✓ Saved session info to sessions.csv
✓ Saved 904 laps to 2023_2_R_laps.parquet
✓ Saved 148 weather records to 2023_2_R_weather.csv
✓ Completed 2023_2_R

Fetching 2023_3_R...


req            INFO 	Using cached data for car_data
req            INFO 	Using cached data for position_data
req            INFO 	Using cached data for weather_data
req            INFO 	Using cached data for race_control_messages
core           INFO 	Finished loading data for 20 drivers: ['1', '63', '44', '14', '55', '18', '16', '23', '10', '27', '31', '22', '4', '20', '21', '81', '24', '2', '77', '11']
Fetching races:  30%|███       | 3/10 [00:12<00:26,  3.85s/it]core           INFO 	Loading data for Azerbaijan Grand Prix - Race [v3.2.0]
req            INFO 	Using cached data for session_info
req            INFO 	Using cached data for driver_info
req            INFO 	Using cached data for session_status_data
req            INFO 	Using cached data for lap_count
req            INFO 	Using cached data for track_status_data
req            INFO 	Using cached data for _extended_timing_data
req            INFO 	Using cached data for timing_app_data
core           INFO 	Processing timing data

✓ Saved session info to sessions.csv
✓ Saved 882 laps to 2023_3_R_laps.parquet
✓ Saved 14 pit stops to 2023_3_R_pitstops.csv
✓ Saved 222 weather records to 2023_3_R_weather.csv
✓ Completed 2023_3_R

Fetching 2023_4_R...


req            INFO 	Using cached data for car_data
req            INFO 	Using cached data for position_data
req            INFO 	Using cached data for weather_data
req            INFO 	Using cached data for race_control_messages
core           INFO 	Finished loading data for 20 drivers: ['16', '1', '11', '55', '44', '14', '4', '22', '18', '81', '63', '23', '77', '2', '24', '20', '10', '21', '31', '27']
Fetching races:  40%|████      | 4/10 [00:14<00:20,  3.40s/it]core           INFO 	Loading data for Miami Grand Prix - Race [v3.2.0]
req            INFO 	Using cached data for session_info
req            INFO 	Using cached data for driver_info
req            INFO 	Using cached data for session_status_data
req            INFO 	Using cached data for lap_count
req            INFO 	Using cached data for track_status_data
req            INFO 	Using cached data for _extended_timing_data
req            INFO 	Using cached data for timing_app_data
core           INFO 	Processing timing data...


✓ Saved session info to sessions.csv
✓ Saved 910 laps to 2023_4_R_laps.parquet
✓ Saved 160 weather records to 2023_4_R_weather.csv
✓ Completed 2023_4_R

Fetching 2023_5_R...


req            INFO 	Using cached data for car_data
req            INFO 	Using cached data for position_data
req            INFO 	Using cached data for weather_data
req            INFO 	Using cached data for race_control_messages
core           INFO 	Finished loading data for 20 drivers: ['11', '14', '55', '20', '10', '63', '16', '31', '1', '77', '23', '27', '44', '24', '21', '4', '22', '18', '81', '2']
Fetching races:  50%|█████     | 5/10 [00:17<00:16,  3.20s/it]core           INFO 	Loading data for Monaco Grand Prix - Race [v3.2.0]
req            INFO 	Using cached data for session_info
req            INFO 	Using cached data for driver_info
req            INFO 	Using cached data for session_status_data
req            INFO 	Using cached data for lap_count
req            INFO 	Using cached data for track_status_data
req            INFO 	Using cached data for _extended_timing_data
req            INFO 	Using cached data for timing_app_data
core           INFO 	Processing timing data...


✓ Saved session info to sessions.csv
✓ Saved 1,118 laps to 2023_5_R_laps.parquet
✓ Saved 155 weather records to 2023_5_R_weather.csv
✓ Completed 2023_5_R

Fetching 2023_6_R...


req            INFO 	Using cached data for car_data
req            INFO 	Using cached data for position_data
req            INFO 	Using cached data for weather_data
req            INFO 	Using cached data for race_control_messages
core           INFO 	Finished loading data for 20 drivers: ['1', '14', '31', '55', '44', '16', '10', '63', '22', '4', '81', '21', '23', '18', '77', '2', '20', '27', '24', '11']
Fetching races:  60%|██████    | 6/10 [00:21<00:13,  3.34s/it]core           INFO 	Loading data for Spanish Grand Prix - Race [v3.2.0]
req            INFO 	Using cached data for session_info
req            INFO 	Using cached data for driver_info
req            INFO 	Using cached data for session_status_data
req            INFO 	Using cached data for lap_count
req            INFO 	Using cached data for track_status_data
req            INFO 	Using cached data for _extended_timing_data
req            INFO 	Using cached data for timing_app_data
core           INFO 	Processing timing data...

✓ Saved session info to sessions.csv
✓ Saved 1,492 laps to 2023_6_R_laps.parquet
✓ Saved 1 pit stops to 2023_6_R_pitstops.csv
✓ Saved 176 weather records to 2023_6_R_weather.csv
✓ Completed 2023_6_R

Fetching 2023_7_R...


req            INFO 	Using cached data for car_data
req            INFO 	Using cached data for position_data
req            INFO 	Using cached data for weather_data
req            INFO 	Using cached data for race_control_messages
core           INFO 	Finished loading data for 20 drivers: ['1', '55', '4', '44', '18', '31', '27', '14', '81', '10', '11', '63', '24', '21', '22', '77', '20', '23', '16', '2']
Fetching races:  70%|███████   | 7/10 [00:24<00:09,  3.18s/it]core           INFO 	Loading data for Canadian Grand Prix - Race [v3.2.0]
req            INFO 	Using cached data for session_info
req            INFO 	Using cached data for driver_info
req            INFO 	Using cached data for session_status_data
req            INFO 	Using cached data for lap_count
req            INFO 	Using cached data for track_status_data
req            INFO 	Using cached data for _extended_timing_data
req            INFO 	Using cached data for timing_app_data
core           INFO 	Processing timing data..

✓ Saved session info to sessions.csv
✓ Saved 1,292 laps to 2023_7_R_laps.parquet
✓ Saved 154 weather records to 2023_7_R_weather.csv
✓ Completed 2023_7_R

Fetching 2023_8_R...


req            INFO 	Using cached data for car_data
req            INFO 	Using cached data for position_data
req            INFO 	Using cached data for weather_data
req            INFO 	Using cached data for race_control_messages
core           INFO 	Finished loading data for 20 drivers: ['1', '14', '44', '63', '27', '31', '4', '81', '23', '16', '55', '11', '20', '77', '10', '18', '21', '2', '22', '24']
Fetching races:  80%|████████  | 8/10 [00:27<00:06,  3.07s/it]core           INFO 	Loading data for Austrian Grand Prix - Race [v3.2.0]
req            INFO 	Using cached data for session_info
req            INFO 	Using cached data for driver_info
req            INFO 	Using cached data for session_status_data
req            INFO 	Using cached data for lap_count
req            INFO 	Using cached data for track_status_data
req            INFO 	Using cached data for _extended_timing_data
req            INFO 	Using cached data for timing_app_data
core           INFO 	Processing timing data..

✓ Saved session info to sessions.csv
✓ Saved 1,294 laps to 2023_8_R_laps.parquet
✓ Saved 162 weather records to 2023_8_R_weather.csv
✓ Completed 2023_8_R

Fetching 2023_9_R...


req            INFO 	Using cached data for car_data
req            INFO 	Using cached data for position_data
req            INFO 	Using cached data for weather_data
req            INFO 	Using cached data for race_control_messages
core           INFO 	Finished loading data for 20 drivers: ['1', '16', '55', '4', '44', '18', '14', '27', '10', '23', '63', '31', '81', '77', '11', '22', '24', '2', '20', '21']
Fetching races:  90%|█████████ | 9/10 [00:30<00:03,  3.10s/it]core           INFO 	Loading data for British Grand Prix - Race [v3.2.0]
req            INFO 	Using cached data for session_info
req            INFO 	Using cached data for driver_info
req            INFO 	Using cached data for session_status_data
req            INFO 	Using cached data for lap_count
req            INFO 	Using cached data for track_status_data
req            INFO 	Using cached data for _extended_timing_data
req            INFO 	Using cached data for timing_app_data
core           INFO 	Processing timing data...

✓ Saved session info to sessions.csv
✓ Saved 1,332 laps to 2023_9_R_laps.parquet
✓ Saved 3 pit stops to 2023_9_R_pitstops.csv
✓ Saved 153 weather records to 2023_9_R_weather.csv
✓ Completed 2023_9_R

Fetching 2023_10_R...


req            INFO 	Using cached data for car_data
req            INFO 	Using cached data for position_data
req            INFO 	Using cached data for weather_data
req            INFO 	Using cached data for race_control_messages
core           INFO 	Finished loading data for 20 drivers: ['1', '4', '81', '16', '55', '63', '44', '23', '14', '10', '27', '18', '31', '2', '11', '22', '24', '21', '20', '77']
Fetching races: 100%|██████████| 10/10 [00:33<00:00,  3.33s/it]

✓ Saved session info to sessions.csv
✓ Saved 946 laps to 2023_10_R_laps.parquet
✓ Saved 151 weather records to 2023_10_R_weather.csv
✓ Completed 2023_10_R


✓ All races fetched and saved





## Validate: Check Data Quality

In [4]:
# Load sessions file
sessions_file = config.paths()['data_raw'] / 'sessions.csv'
sessions = pd.read_csv(sessions_file)

print(f"\nSessions loaded: {len(sessions)}")
print(sessions[['session_key', 'circuit_name', 'date']])

# Validate each race
for _, session in sessions.iterrows():
    session_key = session['session_key']
    
    # Check laps file
    laps_file = config.paths()['data_raw'] / f"{session_key}_laps.parquet"
    if laps_file.exists():
        laps = pd.read_parquet(laps_file)
        
        # Validate minimum lap count
        n_laps = len(laps)
        expected_min_laps = config.MIN_LAPS_PER_RACE
        
        status = "✓" if n_laps >= expected_min_laps else "✗"
        print(f"\n{status} {session_key}: {n_laps} laps (min: {expected_min_laps})")
        
        # Check uniqueness
        validation.validate_uniqueness(laps, ['session_key', 'driver', 'lap'], session_key)
        
        # Check for required columns
        required_cols = ['session_key', 'driver', 'lap', 'lap_time_ms', 'compound']
        validation.validate_schema(laps, required_cols, name=session_key)
    else:
        print(f"✗ {session_key}: Laps file not found")

print("\n✓ Validation complete")


Sessions loaded: 10
  session_key              circuit_name                 date
0    2023_1_R        Bahrain Grand Prix  2023-03-05 00:00:00
1    2023_2_R  Saudi Arabian Grand Prix  2023-03-19 00:00:00
2    2023_3_R     Australian Grand Prix  2023-04-02 00:00:00
3    2023_4_R     Azerbaijan Grand Prix  2023-04-30 00:00:00
4    2023_5_R          Miami Grand Prix  2023-05-07 00:00:00
5    2023_6_R         Monaco Grand Prix  2023-05-28 00:00:00
6    2023_7_R        Spanish Grand Prix  2023-06-04 00:00:00
7    2023_8_R       Canadian Grand Prix  2023-06-18 00:00:00
8    2023_9_R       Austrian Grand Prix  2023-07-02 00:00:00
9   2023_10_R        British Grand Prix  2023-07-09 00:00:00

✓ 2023_1_R: 1035 laps (min: 50)
✓ Uniqueness validation passed for 2023_1_R on ['session_key', 'driver', 'lap']
✓ Schema validation passed for 2023_1_R

✓ 2023_2_R: 904 laps (min: 50)
✓ Uniqueness validation passed for 2023_2_R on ['session_key', 'driver', 'lap']
✓ Schema validation passed for 2023_2_R

✓ 

## Save: Already saved during ingestion

## Repro Notes

- Fetched races using FastF1 API
- Data saved to data/raw/
- All races meet minimum lap count threshold
- Uniqueness and schema validations passed

# New: Telemetry Summaries Ingestion (per session)

This section fetches and saves per-lap telemetry summaries for each ingested race so downstream feature engineering can join telemetry features (`avg_throttle`, `avg_brake`, `avg_speed`, `max_speed`, `corner_time_frac`, `gear_shift_rate`, `drs_usage_ratio`).

In [None]:
# Fetch and persist telemetry summaries for each (season, round)
from f1ts import ingest, config, io_flat
from pathlib import Path

paths = config.paths()
telemetry_dir = paths['data_raw'] / 'telemetry'
telemetry_dir.mkdir(parents=True, exist_ok=True)

def save_telemetry(season: int, round_num: int, session_code: str = 'R'):
    df = ingest.fetch_telemetry_summary(season, round_num, session_code=session_code)
    session_key = config.get_session_key(season, round_num, session_code)
    outp = telemetry_dir / f"{session_key}_telemetry_summary.parquet"
    io_flat.write_parquet(df, outp)
    print(f"✓ Saved telemetry summary: {outp}")

# Example: iterate over the same races configured above
for s in seasons:
    for r in rounds:
        try:
            save_telemetry(int(s), int(r), session_code='R')
        except Exception as e:
            print(f"Warning: telemetry fetch failed for {s}_{r}_R: {e}")