# Notebook 01: Data Extraction for F1 Tyre Prediction (Colab)

This notebook handles the extraction of raw F1 race data using FastF1. It is designed to be run in a Google Colab environment and utilizes scripts and configurations stored in Google Drive.

**Workflow:**
1.  **Mount Google Drive**: Access project files.
2.  **Set Project Path**: Navigate to the correct project directory on Drive.
3.  **Install Dependencies**: Install necessary Python libraries from `requirements.txt`.
4.  **Verify Configuration**: Remind the user to check `colab_path_config.yaml` and `colab_data_extraction_config.yaml`.
5.  **Run Data Extraction Script**: Execute `colab_fetch_fastf1_data.py`.
6.  **Review Outputs**: Check logs and confirm data is saved to Google Drive.

## 1. Mount Google Drive

In [None]:
from google.colab import drive
drive.mount('/content/drive')

## 2. Set Project Path

Navigate to the `F1/colab/` directory within your Google Drive. 
**IMPORTANT:** You MUST update the path below (`%cd`) to point to the correct location of the `FASTF1/F1/colab/` directory in your Google Drive.

In [None]:
# USER ACTION REQUIRED: Update this path!
# Example: %cd "/content/drive/My Drive/Colab Notebooks/FASTF1/F1/colab/"
%cd "/content/drive/My Drive/path/to/your/FASTF1/F1/colab/" # <-- UPDATE THIS LINE

## 3. Install Dependencies

This will install the libraries listed in `F1/colab/requirements.txt`.

In [None]:
!pip install -r requirements.txt

## 4. Verify Configuration Files

Before running the extraction, please ensure the following configuration files are correctly set up in your Google Drive within the `F1/colab/configs/` directory:

1.  **`colab_path_config.yaml`**: 
    *   Crucially, the `base_project_drive_path` must point to the root of your `FASTF1` project on Google Drive (e.g., `/content/drive/My Drive/FASTF1`).
    *   Other paths within this file are derived from `base_project_drive_path`.

2.  **`colab_data_extraction_config.yaml`**:
    *   Verify `years_to_fetch` to process only a small number of races for initial testing (e.g., one or two years, with the script configured to pick a limited number of GPs per year if applicable).
    *   Check `fastf1_cache_path` (this should be the `fastf1_cache_path_drive` from `colab_path_config.yaml` if you want persistent caching on Drive, or a local Colab path like `/tmp/ff1_cache` for ephemeral caching).
    *   Other parameters like `columns_to_extract`, `max_nan_percentage_threshold`, etc.

**You can edit these YAML files directly in a text editor or through the Colab file browser if needed.**

In [None]:
# Optional: Display the content of config files to verify paths
# Make sure you have navigated to F1/colab/ first using %cd

print("--- Contents of F1/colab/configs/colab_path_config.yaml ---")
!cat "configs/colab_path_config.yaml"
print("\n--- Contents of F1/colab/configs/colab_data_extraction_config.yaml ---")
!cat "configs/colab_data_extraction_config.yaml"

## 5. Run Data Extraction Script

This command executes the `colab_fetch_fastf1_data.py` script located in `F1/colab/src/`.
Output and logs from the script will be displayed below.

In [None]:
# Ensure you are in the F1/colab/ directory for the script path to be correct
!python src/colab_fetch_fastf1_data.py

## 6. Review Outputs and Logs

After the script finishes:
1.  **Check Script Output**: Review the print statements and any error messages from the cell above.
2.  **Check `data_download_log.csv`**: Navigate to the `F1/colab/drive/` directory (as configured in `colab_path_config.yaml`) on your Google Drive and open `data_download_log.csv`. Verify that entries for the processed races have `Status: Success` (or an expected error/skip status) and that `FilePath` points to a `.parquet` file within the `raw_data` subdirectory.
3.  **Verify Parquet Files**: Check that the actual `.parquet` files exist in the `F1/colab/drive/raw_data/{year}/{event_name}/` directories on Google Drive.
4.  **FastF1 Cache (Optional)**: If `fastf1_cache_path_drive` was configured, check the `F1/colab/drive/fastf1_cache_colab/` directory for cache files.