# 01 - Ingest FastF1 Data

## Overview
Fetch F1 race data from FastF1 API and save to raw directory.

## Inputs
- Target races list

## Outputs
- data/raw/sessions.csv
- data/raw/{session_key}_laps.parquet
- data/raw/{session_key}_pitstops.csv
- data/raw/{session_key}_weather.csv

In [None]:
import sys
from pathlib import Path
sys.path.insert(0, str(Path.cwd().parent / 'src'))

import pandas as pd
import numpy as np
from f1ts import config, io_flat, ingest, validation

config.ensure_dirs()
print(f"Working with project root: {config.PROJECT_ROOT}")

## Load: Define Target Races

In [None]:
# Define races to ingest
# Format: (season, round_number)
TARGET_RACES = [
    (2023, 1),  # Bahrain
    (2023, 2),  # Saudi Arabia
    (2023, 3),  # Australia
]

print(f"Target races: {len(TARGET_RACES)}")
for season, round_num in TARGET_RACES:
    print(f"  - {season} Round {round_num}")

## Transform: Fetch and Save Races

In [None]:
# Fetch and save each race
ingest.fetch_and_save_races(TARGET_RACES, session_code='R')

print("\n✓ All races fetched and saved")

## Validate: Check Data Quality

In [None]:
# Load sessions file
sessions_file = config.paths()['data_raw'] / 'sessions.csv'
sessions = pd.read_csv(sessions_file)

print(f"\nSessions loaded: {len(sessions)}")
print(sessions[['session_key', 'circuit_name', 'date']])

# Validate each race
for _, session in sessions.iterrows():
    session_key = session['session_key']
    
    # Check laps file
    laps_file = config.paths()['data_raw'] / f"{session_key}_laps.parquet"
    if laps_file.exists():
        laps = pd.read_parquet(laps_file)
        
        # Validate minimum lap count
        n_laps = len(laps)
        expected_min_laps = config.MIN_LAPS_PER_RACE
        
        status = "✓" if n_laps >= expected_min_laps else "✗"
        print(f"\n{status} {session_key}: {n_laps} laps (min: {expected_min_laps})")
        
        # Check uniqueness
        validation.validate_uniqueness(laps, ['session_key', 'driver', 'lap'], session_key)
        
        # Check for required columns
        required_cols = ['session_key', 'driver', 'lap', 'lap_time_ms', 'compound']
        validation.validate_schema(laps, required_cols, name=session_key)
    else:
        print(f"✗ {session_key}: Laps file not found")

print("\n✓ Validation complete")

## Save: Already saved during ingestion

## Repro Notes

- Fetched races using FastF1 API
- Data saved to data/raw/
- All races meet minimum lap count threshold
- Uniqueness and schema validations passed