Code to fetch latest historical weather data, starting 01-01-2025, up to 'now'

open-meteo.com provides hourly weather data, with high resolution. with hourly updates. predictions up to 16 days into to the future, as well as historical data.. 

You can access past weather data dating back to 1940 with the historical weather API 
offered. *However, there is a 5-day delay in the data*. If you want information for the most recent days, you can use the forecast API and adjust the Past Days setting.


https://open-meteo.com/en/docs/knmi-api?models=knmi_seamless (forecast API) for OBSERVED weather data up to 3 months  ago, up to 'current' (actual) weather
https://open-meteo.com/en/docs/historical-weather-api for weather data > 3 months  ago (secondary, only needs to be run when weather data was not fetched >3 months)


In [1]:
# install related packages

%pip install openmeteo-requests
%pip install requests-cache retry-requests numpy pandas


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.0.1[0m[39;49m -> [0m[32;49m25.1.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.0.1[0m[39;49m -> [0m[32;49m25.1.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


# weather features and model involved
temperature_2m 🌡️
wind_speed_10m 💨
wind_speed_100m 💨
cloud_cover ☁️
snowfall ☃
apparent_temperature 🥵
radiation features (3x): diffuse_radiation,direct_normal_irradiance,shortwave_radiation
(Diffuse Solar Radiation DHI, Direct Normal Irradiance DNI, Shortwave Solar Radiation GHI)

Location DeBilt ; model used "KNMI-seamless"

In [2]:

# historical API to fetch historical data (ran once, and then commented out)

import openmeteo_requests
import pandas as pd
import requests_cache
from retry_requests import retry
from datetime import datetime, timezone, timedelta

# define now
now = datetime.now(timezone.utc)
# calculate 5 days back from now
end_date = now - timedelta(days=5)
end_date_date_str = end_date.strftime("%Y-%m-%d")
print(f"5 days back from today: {end_date_date_str}")


# Setup the Open-Meteo API client with cache and retry on error
cache_session = requests_cache.CachedSession('.cache', expire_after = 3600)
retry_session = retry(cache_session, retries = 5, backoff_factor = 0.2)
openmeteo = openmeteo_requests.Client(session = retry_session)

# Make sure all required weather variables are listed here
# The order of variables in hourly or daily is important to assign them correctly below
url = "https://archive-api.open-meteo.com/v1/archive"
params = {
	"latitude": 52.12949,
	"longitude": 5.20514,
	"hourly": ["temperature_2m", "wind_speed_10m", "apparent_temperature", "cloud_cover", "snowfall", 
            "diffuse_radiation", "direct_normal_irradiance", "shortwave_radiation"],
	"models": "knmi_seamless",
	"end_date": end_date_date_str,
	"start_date": "2025-01-01",    
}
responses = openmeteo.weather_api(url, params=params)

# Process single location. (would have to add a for-loop for multiple locations or weather models)
response = responses[0]
print(f"Coordinates {response.Latitude()}°N {response.Longitude()}°E")
print(f"Elevation {response.Elevation()} m asl")
print(f"Timezone {response.Timezone()}{response.TimezoneAbbreviation()}")
print(f"Timezone difference to GMT+0 {response.UtcOffsetSeconds()} s")

# Process hourly data. The order of variables needs to be the same as requested.
hourly = response.Hourly()
hourly_temperature_2m = hourly.Variables(0).ValuesAsNumpy()
hourly_wind_speed_10m = hourly.Variables(1).ValuesAsNumpy()
hourly_apparent_temperature = hourly.Variables(2).ValuesAsNumpy()
hourly_cloud_cover = hourly.Variables(3).ValuesAsNumpy()
hourly_snowfall = hourly.Variables(4).ValuesAsNumpy()
hourly_diffuse_radiation = hourly.Variables(5).ValuesAsNumpy()
hourly_direct_normal_irradiance = hourly.Variables(6).ValuesAsNumpy()
hourly_shortwave_radiation = hourly.Variables(7).ValuesAsNumpy()

hourly_data = {"date": pd.date_range(
	start = pd.to_datetime(hourly.Time(), unit = "s", utc = True),
	end = pd.to_datetime(hourly.TimeEnd(), unit = "s", utc = True),
	freq = pd.Timedelta(seconds = hourly.Interval()),
	inclusive = "left"
)}

hourly_data["temperature_2m"] = hourly_temperature_2m
hourly_data["wind_speed_10m"] = hourly_wind_speed_10m
hourly_data["apparent_temperature"] = hourly_apparent_temperature
hourly_data["cloud_cover"] = hourly_cloud_cover
hourly_data["snowfall"] = hourly_snowfall
hourly_data["diffuse_radiation"] = hourly_diffuse_radiation
hourly_data["direct_normal_irradiance"] = hourly_direct_normal_irradiance
hourly_data["shortwave_radiation"] = hourly_shortwave_radiation

weather_dataframe_obs = pd.DataFrame(data = hourly_data)
print(weather_dataframe_obs)

import sqlite3

# Connect to the SQLite database
db_path = '../data/WARP.db'
conn = sqlite3.connect(db_path)

# Write the DataFrame to the database table 'raw_weather_obs'
# If table exists, replace it. If not, create new table
weather_dataframe_obs.to_sql('raw_weather_obs', conn, if_exists='replace', index=False)

# Close the connection
conn.close()

print("Data successfully written to database table 'raw_weather_obs'")

5 days back from today: 2025-06-11
Coordinates 52.13199996948242°N 5.190999984741211°E
Elevation 5.0 m asl
Timezone NoneNone
Timezone difference to GMT+0 0 s
                          date  temperature_2m  wind_speed_10m  \
0    2025-01-01 00:00:00+00:00          7.4325       22.680000   
1    2025-01-01 01:00:00+00:00          7.6325       20.880001   
2    2025-01-01 02:00:00+00:00          7.2825       28.799999   
3    2025-01-01 03:00:00+00:00          7.7825       29.519999   
4    2025-01-01 04:00:00+00:00          7.6825       28.799999   
...                        ...             ...             ...   
3883 2025-06-11 19:00:00+00:00         18.4825        7.920000   
3884 2025-06-11 20:00:00+00:00         16.9825        7.559999   
3885 2025-06-11 21:00:00+00:00         15.5825        5.760000   
3886 2025-06-11 22:00:00+00:00         14.7825       11.159999   
3887 2025-06-11 23:00:00+00:00         14.5325       13.320000   

      apparent_temperature  cloud_cover  snowfall

In [3]:
import sqlite3  
import pandas as pd

# Connect to the SQLite database
db_path = '../data/WARP.db'
conn = sqlite3.connect(db_path)

# Connect to the SQLite database using the existing db_path
conn = sqlite3.connect(db_path)
# Query the last 5 rows from raw_weather_obs table
df_head = pd.read_sql_query("SELECT * FROM raw_weather_obs ORDER BY date DESC LIMIT 5", conn)

# Close the connection
conn.close()

# Display the results
print(df_head)

# Connect to the SQLite database
# collect the date column from the raw_weather_obs table
conn = sqlite3.connect(db_path)
hist_weather_dates = pd.read_sql_query("SELECT date FROM raw_weather_obs", conn)
conn.close()

# Convert to datetime and find max date
hist_weather_dates['date'] = pd.to_datetime(hist_weather_dates['date'])
most_recent_date = hist_weather_dates['date'].max().strftime('%Y-%m-%d')
print(f"The most recent date in raw_weather_obs is: {most_recent_date}")

                        date  temperature_2m  wind_speed_10m  \
0  2025-06-11 23:00:00+00:00         14.5325       13.320000   
1  2025-06-11 22:00:00+00:00         14.7825       11.159999   
2  2025-06-11 21:00:00+00:00         15.5825        5.760000   
3  2025-06-11 20:00:00+00:00         16.9825        7.559999   
4  2025-06-11 19:00:00+00:00         18.4825        7.920000   

   apparent_temperature  cloud_cover  snowfall  diffuse_radiation  \
0             12.592342          0.0       0.0           0.000000   
1             13.343218          0.0       0.0           0.000000   
2             14.857220          0.0       0.0           0.000000   
3             15.729401          0.0       0.0          21.677280   
4             17.213360          0.0       0.0          78.916222   

   direct_normal_irradiance  shortwave_radiation  
0                  0.000000                  0.0  
1                  0.000000                  0.0  
2                  0.000000                  0.

In [4]:
''' does not work:# forecast API to fetch forecast data (py file based)
! python3 ingest_open-meteo_obs.py'''

' does not work:# forecast API to fetch forecast data (py file based)\n! python3 ingest_open-meteo_obs.py'

In [5]:
# forecast API to fetch 'recent' historical data

import openmeteo_requests
import pandas as pd
import requests_cache
from retry_requests import retry
from datetime import datetime, timezone, timedelta

# define now
now = datetime.now(timezone.utc)
# Convert current time to YYYY-MM-DD format
now_date_str = now.strftime("%Y-%m-%d")

# Setup the Open-Meteo API client with cache and retry on error
cache_session = requests_cache.CachedSession('.cache', expire_after = 3600)
retry_session = retry(cache_session, retries = 5, backoff_factor = 0.2)
openmeteo = openmeteo_requests.Client(session = retry_session)

# Make sure all required weather variables are listed here
# The order of variables in hourly or daily is important to assign them correctly below
url = "https://api.open-meteo.com/v1/forecast"
params = {
	"latitude": 52.12949,
	"longitude": 5.20514,
	"hourly": ["temperature_2m", "wind_speed_10m", "apparent_temperature", "cloud_cover", "snowfall", "diffuse_radiation", "direct_normal_irradiance", "shortwave_radiation"],
	"models": "knmi_seamless",
	"end_date": now_date_str,
	"start_date": most_recent_date,
}
responses = openmeteo.weather_api(url, params=params)

# Process single location. (would have to add a for-loop for multiple locations or weather models)
response = responses[0]
print(f"Coordinates {response.Latitude()}°N {response.Longitude()}°E")
print(f"Elevation {response.Elevation()} m asl")
print(f"Timezone {response.Timezone()}{response.TimezoneAbbreviation()}")
print(f"Timezone difference to GMT+0 {response.UtcOffsetSeconds()} s")

# Process hourly data. The order of variables needs to be the same as requested.
hourly = response.Hourly()
hourly_temperature_2m = hourly.Variables(0).ValuesAsNumpy()
hourly_wind_speed_10m = hourly.Variables(1).ValuesAsNumpy()
hourly_apparent_temperature = hourly.Variables(2).ValuesAsNumpy()
hourly_cloud_cover = hourly.Variables(3).ValuesAsNumpy()
hourly_snowfall = hourly.Variables(4).ValuesAsNumpy()
hourly_diffuse_radiation = hourly.Variables(5).ValuesAsNumpy()
hourly_direct_normal_irradiance = hourly.Variables(6).ValuesAsNumpy()
hourly_shortwave_radiation = hourly.Variables(7).ValuesAsNumpy()

hourly_data = {"date": pd.date_range(
	start = pd.to_datetime(hourly.Time(), unit = "s", utc = True),
	end = pd.to_datetime(hourly.TimeEnd(), unit = "s", utc = True),
	freq = pd.Timedelta(seconds = hourly.Interval()),
	inclusive = "left"
)}

hourly_data["temperature_2m"] = hourly_temperature_2m
hourly_data["wind_speed_10m"] = hourly_wind_speed_10m
hourly_data["apparent_temperature"] = hourly_apparent_temperature
hourly_data["cloud_cover"] = hourly_cloud_cover
hourly_data["snowfall"] = hourly_snowfall
hourly_data["diffuse_radiation"] = hourly_diffuse_radiation
hourly_data["direct_normal_irradiance"] = hourly_direct_normal_irradiance
hourly_data["shortwave_radiation"] = hourly_shortwave_radiation

recent_obs_dataframe = pd.DataFrame(data = hourly_data)
# Filter out future dates (greater than 'now'), so dataset is not contaminated with predictive data
recent_obs_dataframe = recent_obs_dataframe[recent_obs_dataframe['date'] <= now]
print(recent_obs_dataframe)


Coordinates 52.13199996948242°N 5.190999984741211°E
Elevation 5.0 m asl
Timezone NoneNone
Timezone difference to GMT+0 0 s
                         date  temperature_2m  wind_speed_10m  \
0   2025-06-11 00:00:00+00:00          8.7825            4.68   
1   2025-06-11 01:00:00+00:00          7.9325            6.12   
2   2025-06-11 02:00:00+00:00          7.2825            4.32   
3   2025-06-11 03:00:00+00:00          8.8325            5.04   
4   2025-06-11 04:00:00+00:00          7.5325            2.52   
..                        ...             ...             ...   
119 2025-06-15 23:00:00+00:00         15.2325            2.88   
120 2025-06-16 00:00:00+00:00         14.3325            6.12   
121 2025-06-16 01:00:00+00:00         14.0825            5.40   
122 2025-06-16 02:00:00+00:00         13.9325            5.40   
123 2025-06-16 03:00:00+00:00         13.5825            4.68   

     apparent_temperature  cloud_cover  snowfall  diffuse_radiation  \
0                7.545195

In [6]:
import sqlite3 
import pandas as pd

# Connect to the SQLite database
db_path = '../data/WARP.db'
conn = sqlite3.connect(db_path)

# Read existing data from raw_weather_obs
existing_data = pd.read_sql_query("SELECT * FROM raw_weather_obs", conn)

# Convert date columns to datetime for both dataframes
existing_data['date'] = pd.to_datetime(existing_data['date'])
recent_obs_dataframe['date'] = pd.to_datetime(recent_obs_dataframe['date'])

# Remove any duplicates based on date and keep the latest values
merged_df_weather_obs = pd.concat([existing_data, recent_obs_dataframe])
initial_rows = len(merged_df_weather_obs)
merged_df_weather_obs = merged_df_weather_obs.drop_duplicates(subset='date', keep='last')
removed_rows = initial_rows - len(merged_df_weather_obs)
print(f"Removed {removed_rows} duplicate rows")

# Sort by date
merged_df_weather_obs = merged_df_weather_obs.sort_values('date')
merged_df_weather_obs.columns = merged_df_weather_obs.columns.map(str)

# Write the merged dataframe back to the database
merged_df_weather_obs.to_sql('raw_weather_obs', conn, if_exists='replace', index=False)

# Close the connection
conn.close()

print(f"Data successfully merged. Total rows: {len(merged_df_weather_obs)}")
print(f"Date range: {merged_df_weather_obs['date'].min()} to {merged_df_weather_obs['date'].max()}")

Removed 24 duplicate rows
Data successfully merged. Total rows: 3988
Date range: 2025-01-01 00:00:00+00:00 to 2025-06-16 03:00:00+00:00


In [7]:
# Print column names and data types
print("Features in merged_df_weather_obs:")
print("\nColumns and their data types:")
print(merged_df_weather_obs.dtypes)

print("\nSample of non-null values for each column:")
print(merged_df_weather_obs.info())

Features in merged_df_weather_obs:

Columns and their data types:
date                        datetime64[ns, UTC]
temperature_2m                          float64
wind_speed_10m                          float64
apparent_temperature                    float64
cloud_cover                             float64
snowfall                                float64
diffuse_radiation                       float64
direct_normal_irradiance                float64
shortwave_radiation                     float64
dtype: object

Sample of non-null values for each column:
<class 'pandas.core.frame.DataFrame'>
Index: 3988 entries, 0 to 123
Data columns (total 9 columns):
 #   Column                    Non-Null Count  Dtype              
---  ------                    --------------  -----              
 0   date                      3988 non-null   datetime64[ns, UTC]
 1   temperature_2m            3988 non-null   float64            
 2   wind_speed_10m            3988 non-null   float64            
 3   appa