# F1 Weather Dashboard

To do later:
- add forecasts in the future (source from OpenMeteo)
- zoom to track when you click on it and zoom back out when you click off
  - add weather rasters upon zoom
- add arrows to transition from one track to the next in schedule order

## Imports

Navigate to final_project home directory and start conda session with `conda activate ~/Documents/INFO609/final_project/fp.conda`.

~~OG plan: import matplotlib, openweatherdata client, fastf1, geopandas(?), some library for the Seasonal Kendall.~~

Just kidding! OpenWeatherData's free tier doesn't have access to historical data past 1 year. Switching to Open-Meteo.org. Love Europeans.

Also just used pandas-geojson instead of geopandas.

In [None]:
import fastf1 as ff1
import openmeteo_requests

import requests_cache
from retry_requests import retry
import unicodedata
import re
import os

import pandas as pd
import matplotlib.pyplot as plt
import pandas_geojson as pgj
import datetime as dt
from dateutil import parser
from pathlib import Path

## Credentials

Create session and log in to [Open-Meteo API](https://open-meteo.com/en/docs/historical-weather-api).

In [2]:
cache_session = requests_cache.CachedSession('.cache', expire_after = -1)
retry_session = retry(cache_session, retries = 5, backoff_factor = 0.2)
om = openmeteo_requests.Client(session = retry_session)

## Define functions

Collect all functions to be used later here.

In [None]:
# define functions

def stripper(df, date_col):
    return df[date_col].apply(lambda x: parser.parse(x).replace(tzinfo = None))

def get_weather_data(url, params):
    response = om.weather_api(url, params = params)
    return response

def remove_diacritics(text):
    nfkd_form = unicodedata.normalize('NFD', text)
    return re.sub(r'[\u0300-\u036f]', '', nfkd_form)

def split_date_time(df, date_col = 'date', month_col = 'month', day_col = 'day', hour_col = 'hour'):
    df[month_col] = df[date_col].dt.strftime('%b') # translate to short month string (4 = Apr)
    df[day_col] = df[date_col].dt.day # translate to day without leading 0 (01 = 1)
    df[hour_col] = df[date_col].dt.strftime('%-I %p') # translate to full hour (14 = 2 pm)
    return df

## Fetch F1 2025 schedule & locations

I might just assemble this manually. Download the 2025 schedule w/ dates and place names from FastF1. Or just assemble lat & long from Wikipedia. 

Columns: 
- Place (Address?)
- Long
- Lat
- Date(time)

Do I do all three days? The core is really the race day but raining during practice is consequential... I think all 3 days since they can ruin their race by crashing on a practice as well. Maybe for the plot, I average each metric across all three days, then use that as a point.

In [None]:
# get schedule from FastF1 & create CSV

schedule_25_raw = ff1.get_event_schedule(2025)
schedule_25_raw.to_csv('f1_schedule_25')

Once all data was exported as a CSV, I manually added long/lat **and** decimal coordinates columns sourced from Wikipedia's GeoHack tool. Unless I do a bunch of Excel wizardry every time I open the CSV, my date columns come out as strings. ~~Instead I translate the strings into datetime objects as well.~~ That turns out to be what I needed to do. 

In [None]:
# read CSV back in after editing

s25 = pd.read_csv('s25_geo.csv')

# translate string date series(eses) into python datetime objects -- unnecessary

# session_numbers = ['Session1Date', 'Session2Date', 'Session3Date', 'Session4Date', 'Session5Date']

In [None]:
# testing stripper

s25_select = s25[['Country', 'Location', 'DecimalXY', 'DecimalX', 'DecimalY', 'Session5Date', 'CourseName']].copy()
print(s25_select['Session5Date'].head())
print(s25_select['Session5Date'].dtype)

s25_select['Session5Date'] = stripper(s25_select, 'Session5Date')

print(s25_select['Session5Date'].head())
print(s25_select['Session5Date'].dtype)

In [None]:
# export select columns from s25 as GeoJSON

s25_select = s25[['Country', 'Location', 'DecimalXY', 'DecimalX', 'DecimalY', 'Session5Date', 'CourseName']].copy()
s25_select['Session5Date'] = stripper(s25_select, 'Session5Date')
s25_select['type'] = 'Point'
s25_select['month'] = ''
s25_select['day'] = 0
s25_select['hour'] = ''
s25_select = split_date_time(s25_select, date_col = 'Session5Date', month_col = 'month', day_col = 'day', hour_col = 'hour')
s25_select['xy_list'] = s25_select.apply(lambda row: [row['DecimalY'], row['DecimalX']], axis = 1)

s25_gj = pgj.GeoJSON.from_dataframe(s25_select, geometry_type_col = 'type', coordinate_col = 'xy_list', property_col_list = ['Country', 'Location', 'CourseName', 'month', 'day', 'hour'])
pgj.save_geojson(s25_gj, 'tracks.geojson')

## Fetch weather data

We need to do the same dates on each year. So my request consists of: 
- long (in decimal)
- lat (in decimal)
- assemble datetime: 
  - month, day, and time from Session5 (race) date
  - year from range (1950 - 2024) in loop function

We're starting in 1950 since that's the year that F1 debuted and OpenMeteo has it.

Get:
- precipitation level (derive precipitation yes/no?)
- temperature
- barometric pressure
- humidity
- wind speed
for the hour after the race starts.

In [7]:
# 2025 race start hours

melbourne_race_time = 15
shanghai_race_time = 15
suzuka_race_time = 14
sakhir_race_time = 18
jeddah_race_time = 20
miami_race_time = 16
imola_race_time = 15
monaco_race_time = 15
barcelona_race_time = 15
montréal_race_time = 14
spielberg_race_time = 15
silverstone_race_time = 15
spa_francorchamps_race_time = 15
budapest_race_time = 15
zandvoort_race_time = 15
monza_race_time = 15
baku_race_time = 15
marina_bay_race_time = 20
austin_race_time = 14
mexico_city_race_time = 14
são_paulo_race_time = 14
las_vegas_race_time = 20
lusail_race_time = 19
yas_island_race_time = 17

Responses come back as a JSON object where the "hourly" property is a list of named arrays. Pull these named arrays out into their own columns so they match up.

In [None]:
# main weather fetching loop – uncomment bottom to export csvs with all hours

url = "https://archive-api.open-meteo.com/v1/archive"
years = range(1950, 2025)
weather_vars = [
    "temperature_2m", "relative_humidity_2m", "pressure_msl",
    "precipitation", "rain", "snowfall", "wind_speed_10m"
]
city_weather_dfs = []

output_dir = "weather_data_csvs"
os.makedirs(output_dir, exist_ok = True)

for event in s25.itertuples():
    dx = event.DecimalX
    dy = event.DecimalY
    location = event.Location
    race_month = '{:02d}'.format(event.Session5Date.month)
    race_day = '{:02d}'.format(event.Session5Date.day)

    all_years = []

    for year in years:
        date_str = f"{year}-{race_month}-{race_day}"

        params = {
            "latitude": dx,
            "longitude": dy,
            "start_date": date_str,
            "end_date": date_str,
            "hourly": weather_vars
        }

        responses = om.weather_api(url, params=params)
        if not responses:
            continue

        response = responses[0]
        hourly = response.Hourly()

        time_range = pd.date_range(
            start=pd.to_datetime(hourly.Time(), unit="s", utc=True),
            end=pd.to_datetime(hourly.TimeEnd(), unit="s", utc=True),
            freq=pd.Timedelta(seconds=hourly.Interval()),
            inclusive="left"
        )

        hourly_temperature_2m = hourly.Variables(0).ValuesAsNumpy()
        hourly_relative_humidity_2m = hourly.Variables(1).ValuesAsNumpy()
        hourly_pressure_msl = hourly.Variables(2).ValuesAsNumpy()
        hourly_precipitation = hourly.Variables(3).ValuesAsNumpy()
        hourly_rain = hourly.Variables(4).ValuesAsNumpy()
        hourly_snowfall = hourly.Variables(5).ValuesAsNumpy()
        hourly_wind_speed_10m = hourly.Variables(6).ValuesAsNumpy()

        data = {
            "datetime": time_range,
            "temperature_2m": hourly_temperature_2m,
            "relative_humidity_2m": hourly_relative_humidity_2m,
            "pressure_msl": hourly_pressure_msl,
            "precipitation": hourly_precipitation,
            "rain": hourly_rain,
            "snowfall": hourly_snowfall,
            "wind_speed_10m": hourly_wind_speed_10m
        }

        df = pd.DataFrame(data)
        all_years.append(df)

        all_df = pd.concat(all_years, ignore_index = True)
        city_weather_dfs.append(all_df)
    
        # filename = f"{location.lower().replace(' ', '_').replace('/', '_').replace('-', '_')}_weather_data.csv"
        # filepath = os.path.join(output_dir, filename)
        # all_df.to_csv(filepath, index=False)

city_weather = pd.DataFrame.from_records(city_weather_dfs)

In [None]:
# select only the rows that match the start hour specified above & export csvs

race_times = {
    var.replace("_race_time", ""): val
    for var, val in globals().items()
    if var.endswith("_race_time")
}

# Set directories
input_dir = Path("weather_data_csvs")
output_dir = Path("weather_averages")
output_dir.mkdir(exist_ok = True)

# Process each file
for file in input_dir.glob("*.csv"):
    city = file.stem.replace("_weather_data", "")
    race_hour = race_times.get(city)

    if race_hour is None:
        print(f"No race time defined for {city}, skipping.")
        continue

    # Load and parse datetime
    df = pd.read_csv(file)
    df["datetime"] = pd.to_datetime(df["datetime"], errors="coerce")
    df = df.dropna(subset=["datetime"])

    # Filter by race hour
    df["hour"] = df["datetime"].dt.hour
    race_df = df[df["hour"] == race_hour].copy()

    # Save to new CSV
    output_path = output_dir / f"{city}_race_hour.csv"
    race_df.to_csv(output_path, index=False)
    print(f"Saved {city} race-hour data to {output_path}")

In [None]:
# test core loop that gets lat/long and dates, then generates 1 dataframe per city

url = "https://archive-api.open-meteo.com/v1/archive"

dx = 0.0
dy = 0.0
years = range(1950, 2025)
weather_vars = ["temperature_2m", "relative_humidity_2m", "pressure_msl", "precipitation", "rain", "snowfall", "wind_speed_10m"]

for event in s25.itertuples():
    
    dx = event.DecimalX
    dy = event.DecimalY
    location = event.Location
    race_month = '{:02d}'.format(event.Session5Date.month)
    race_day = '{:02d}'.format(event.Session5Date.day)
    race_time = event.Session5Date.time()

    for year in years:

        # start = dt.datetime(year, race_month, race_day, race_time.hour, race_time.minute, 0) # just get it on race day
        start = f"2024-{race_month}-{race_day}"
        end = start # create new datetime that ends on the date

        params = {
            "latitude": dx,
            "longitude": dy,
            "start_date": start,
            "end_date": end,
            "hourly": weather_vars
        }

        responses = om.weather_api(url, params = params)

In [None]:
# bug fix don't worry about it

it = s25.itertuples()
for row in it:
    print(row)

In [None]:
# average data – did not use

from pathlib import Path

# Set the input and output directories
input_dir = Path("weather_data_csvs")  # your folder with the CSVs
output_dir = Path("weather_averages")
output_dir.mkdir(exist_ok = True)

# Iterate through all CSV files
for file in input_dir.glob("*.csv"):
    # Load CSV
    df = pd.read_csv(file)
    
    # Convert datetime column to proper datetime format
    df["datetime"] = pd.to_datetime(df["datetime"], errors="coerce")
    
    # Drop rows with invalid datetime just in case
    df = df.dropna(subset=["datetime"])

    # Extract year
    df["year"] = df["datetime"].dt.year

    # Group by year and compute mean for selected columns
    avg_df = df.groupby("year")[["rain", "snowfall"]].mean().reset_index()

    # Save output CSV
    city_name = file.stem.replace("_weather_data", "")
    output_path = output_dir / f"{city_name}_yearly_averages.csv"
    avg_df.to_csv(output_path, index=False)

    print(f"Saved yearly averages for {city_name} to {output_path}")

## Visualize the trends

matplotlib up some scatter plots with a trend line and an average indicator.

In [None]:
# plotting – linear regression, y-axes differ per location

import numpy as np

input_dir = 'weather_averages' # csv source
output_dir = 'weather_averages/plots' # svg & png destination
os.makedirs(output_dir, exist_ok=True)

weather_vars = [
    "temperature_2m", "relative_humidity_2m", "pressure_msl",
    "precipitation", "wind_speed_10m"
]

# Define your colors here (hex or named colors)
color_map = {
    "temperature_2m": {"dots": "#ed2136", "trend": "#f9a353"},
    "relative_humidity_2m": {"dots": "#358259", "trend": "#8ec99a"},
    "pressure_msl": {"dots": "#3c58b5", "trend": "#5279FA"},
    "precipitation": {"dots": "#4888c4", "trend": "#69a4db"},
    "wind_speed_10m": {"dots": "#9FB3B1", "trend": "#B6CCCA"},
}

# Loop over each CSV
for filename in os.listdir(input_dir):
    if filename.endswith('_race_hour.csv'):
        city_name = filename.replace('_race_hour.csv', '')
        df = pd.read_csv(os.path.join(input_dir, filename))

        # Extract year from date
        df['year'] = pd.to_datetime(df['datetime']).dt.year

        for col in weather_vars:
            if col not in df.columns:
                continue

            x = df['year']
            y = df[col]

            fig, ax = plt.subplots(figsize=(12, 4))  # 3:1 aspect ratio

            # Colors for this variable
            dot_color = color_map[col]["dots"]
            trend_color = color_map[col]["trend"]

            # Scatter plot
            ax.scatter(x, y, color=dot_color)

            # Trend line (linear regression)
            z = np.polyfit(x, y, 1)
            p = np.poly1d(z)
            ax.plot(x, p(x), color=trend_color, linestyle="-", label="Trend")

            # Average line
            avg = y.mean()
            ax.axhline(avg, color='gray', linestyle='dotted', label=f'Average ({avg:.2f})')

            # X-axis formatting
            ax.set_xticks(np.arange(x.min(), x.max()+1, 5))
            ax.set_xlabel('Year')
            ax.set_ylabel(col.replace('_', ' ').capitalize())
            ax.legend()
            ax.spines[['top', 'right']].set_visible(False)
            ax.grid(axis='x')

            # Save plot
            out_path = os.path.join(output_dir, f'{city_name}_{col}_race_hour.svg')
            plt.tight_layout()
            plt.savefig(out_path, format='svg')
            plt.close()

In [None]:
# plotting - exponential smoothing, y-axes same across variables

from statsmodels.tsa.holtwinters import ExponentialSmoothing

input_dir = 'weather_averages'  # csv source
output_dir = 'weather_averages/plots_smoothing'  # svg & png destination
os.makedirs(output_dir, exist_ok=True)

weather_vars = [
    "temperature_2m", "relative_humidity_2m", "pressure_msl",
    "precipitation", "wind_speed_10m"
]

# Define your colors here (hex or named colors)
color_map = {
    "temperature_2m": {"dots": "#ed2136", "trend": "#f9a353"},
    "relative_humidity_2m": {"dots": "#358259", "trend": "#8ec99a"},
    "pressure_msl": {"dots": "#3c58b5", "trend": "#5279FA"},
    "precipitation": {"dots": "#4888c4", "trend": "#69a4db"},
    "wind_speed_10m": {"dots": "#9FB3B1", "trend": "#B6CCCA"},
}

# Step 1: Determine global y-limits for each variable
y_limits = {var: {"min": float('inf'), "max": float('-inf')} for var in weather_vars}

for filename in os.listdir(input_dir):
    if filename.endswith('_race_hour.csv'):
        df = pd.read_csv(os.path.join(input_dir, filename))
        df['year'] = pd.to_datetime(df['datetime']).dt.year

        for var in weather_vars:
            if var in df.columns:
                y = df[var]
                y_limits[var]["min"] = min(y_limits[var]["min"], y.min())
                y_limits[var]["max"] = max(y_limits[var]["max"], y.max())

# Step 2: Plot with consistent y-limits and exponential smoothing
for filename in os.listdir(input_dir):
    if filename.endswith('_race_hour.csv'):
        city_name = filename.replace('_race_hour.csv', '')
        city_name = city_name.replace('_', ' ')
        city_name = remove_diacritics(city_name)
        df = pd.read_csv(os.path.join(input_dir, filename))
        df['year'] = pd.to_datetime(df['datetime']).dt.year

        for col in weather_vars:
            if col not in df.columns:
                continue

            x = df['year']
            y = df[col]

            fig, ax = plt.subplots(figsize=(12, 4))  # 3:1 aspect ratio

            dot_color = color_map[col]["dots"]
            trend_color = color_map[col]["trend"]

            # Scatter plot
            ax.scatter(x, y, color=dot_color)

            # Exponential Smoothing
            df_sorted = df.sort_values('year')  # Ensure chronological order
            y_sorted = df_sorted[col]
            x_sorted = df_sorted['year']

            # Fit model only if there are enough points
            if len(y_sorted) > 5:
                model = ExponentialSmoothing(y_sorted, trend=None, seasonal=None)
                fit = model.fit(smoothing_level=0.1, optimized=False)
                ax.plot(x_sorted, fit.fittedvalues, color=trend_color, label='Smoothed Trend')

            # Average line
            avg = y.mean()
            ax.axhline(avg, color='gray', linestyle='dotted', label=f'Average ({avg:.2f})')

            # X-axis formatting
            ax.set_xticks(np.arange(x.min(), x.max()+1, 5))
            ax.set_xlabel('Year')
            ax.set_ylabel(col.replace('_', ' ').capitalize())
            ax.set_ylim(y_limits[col]["min"], y_limits[col]["max"])
            ax.legend()
            ax.spines[['top', 'right']].set_visible(False)
            ax.grid(axis='x')

            # Save plot
            out_path = os.path.join(output_dir, f'{city_name}_{col}_race_hour.svg')
            plt.tight_layout()
            plt.savefig(out_path, format='svg')
            plt.close()


## Pull Forecasts

Open-meteo also has forecast and climate change models – are these the same?

## Tutorial data

Just trying to work out how the data comes normally with the request as outlined on Open-Meteo's website.

In [None]:
# dummy tutorial request to see the data

import openmeteo_requests

import pandas as pd
import requests_cache
from retry_requests import retry

# Setup the Open-Meteo API client with cache and retry on error
cache_session = requests_cache.CachedSession('.cache', expire_after = -1)
retry_session = retry(cache_session, retries = 5, backoff_factor = 0.2)
openmeteo = openmeteo_requests.Client(session = retry_session)

# Make sure all required weather variables are listed here
# The order of variables in hourly or daily is important to assign them correctly below
url = "https://archive-api.open-meteo.com/v1/archive"
params = {
	"latitude": 52.52,
	"longitude": 13.41,
	"start_date": "2025-04-25",
	"end_date": "2025-05-09",
	"hourly": ["temperature_2m", "relative_humidity_2m", "precipitation"]
}
responses = openmeteo.weather_api(url, params=params)

# Process first location. Add a for-loop for multiple locations or weather models
response = responses[0]
print(f"Coordinates {response.Latitude()}°N {response.Longitude()}°E")
print(f"Elevation {response.Elevation()} m asl")
print(f"Timezone {response.Timezone()}{response.TimezoneAbbreviation()}")
print(f"Timezone difference to GMT+0 {response.UtcOffsetSeconds()} s")

# Process hourly data. The order of variables needs to be the same as requested.
hourly = response.Hourly()
hourly_temperature_2m = hourly.Variables(0).ValuesAsNumpy()
hourly_relative_humidity_2m = hourly.Variables(1).ValuesAsNumpy()
hourly_precipitation = hourly.Variables(2).ValuesAsNumpy()

hourly_data = {"date": pd.date_range(
	start = pd.to_datetime(hourly.Time(), unit = "s", utc = True),
	end = pd.to_datetime(hourly.TimeEnd(), unit = "s", utc = True),
	freq = pd.Timedelta(seconds = hourly.Interval()),
	inclusive = "left"
)}

hourly_data["temperature_2m"] = hourly_temperature_2m
hourly_data["relative_humidity_2m"] = hourly_relative_humidity_2m
hourly_data["precipitation"] = hourly_precipitation

hourly_dataframe = pd.DataFrame(data = hourly_data)
print(hourly_dataframe)