### ERA5 Dataset 
The server splits the data you requested into separate files based on the type of variable and the model stream it comes from.
Your request included variables from different categories, which the system packages separately. Hereâ€™s the breakdown:

data_stream-oper_stepType-instant.nc (Atmospheric Instantaneous) : This file contains variables that represent a snapshot in time.
Variables: 10m_u_component_of_wind, 10m_v_component_of_wind, mean_sea_level_pressure, sea_surface_temperature, total_cloud_cover.

data_stream-oper_stepType-accum.nc (Atmospheric Accumulated) : This file contains variables that are accumulated or averaged over a time period. total_precipitation isn't an instantaneous value; it's the total rain that fell in the hours leading up to the timestamp (e.g., total rainfall between 00:00 and 06:00).
Variables: total_precipitation.

data_stream-wave_stepType-instant.nc (Wave Model Instantaneous) : This file contains variables generated by a separate wave model (WAM), not the primary atmospheric model.
Variables: significant_height_of_combined_wind_waves_and_swell.

In [1]:
import xarray as xr
import zipfile
import os
import glob

YEAR = "2024"

In [2]:
for month_num in range(1, 13):
    month_str = f"{month_num:02d}"
    extract_path = os.path.join(YEAR, month_str)
    archive_path = os.path.join(extract_path, f"surf_data_{YEAR}_{month_str}.nc")
    try:
        with zipfile.ZipFile(archive_path, 'r') as zip_file:
            zip_file.extractall(path=extract_path)
        print(f"  Extracted: {os.path.basename(archive_path)}")
    except Exception as e:
        print(f"Error extracting {os.path.basename(archive_path)}: {e}")

  Extracted: surf_data_2024_01.nc
  Extracted: surf_data_2024_02.nc
  Extracted: surf_data_2024_03.nc
  Extracted: surf_data_2024_04.nc
  Extracted: surf_data_2024_05.nc
  Extracted: surf_data_2024_06.nc
  Extracted: surf_data_2024_07.nc
  Extracted: surf_data_2024_08.nc
  Extracted: surf_data_2024_09.nc
  Extracted: surf_data_2024_10.nc
  Extracted: surf_data_2024_11.nc
  Extracted: surf_data_2024_12.nc


In [3]:
list_of_monthly_datasets = []

for month_num in range(1, 13):
    month_str = f"{month_num:02d}"
    month_directory = os.path.join(YEAR, month_str)
    
    print(f"  Processing directory: {month_directory}")

    files_to_merge = [
        os.path.join(month_directory, "data_stream-oper_stepType-instant.nc"),
        os.path.join(month_directory, "data_stream-oper_stepType-accum.nc"),
        os.path.join(month_directory, "data_stream-wave_stepType-instant.nc"),
    ]

    try:
        datasets = []
        for file_path in files_to_merge:
            if os.path.exists(file_path):
                ds_part = xr.open_dataset(file_path)
                datasets.append(ds_part)
        
        if not datasets:
            print(f"    -> No data_stream files found to merge. Skipping.")
            continue
            
        combined_ds = xr.merge(datasets, compat='override')
        cleaned_ds = combined_ds.dropna(dim="valid_time", how="all")
        list_of_monthly_datasets.append(cleaned_ds)

    except Exception as e:
        print(f"An error occurred: {e}")

  Processing directory: 2024\01
  Processing directory: 2024\02
  Processing directory: 2024\03
  Processing directory: 2024\04
  Processing directory: 2024\05
  Processing directory: 2024\06
  Processing directory: 2024\07
  Processing directory: 2024\08
  Processing directory: 2024\09
  Processing directory: 2024\10
  Processing directory: 2024\11
  Processing directory: 2024\12


In [4]:
try:
    ds = xr.concat(list_of_monthly_datasets, dim="valid_time")
    print(ds)
except Exception as e:
    print(e)

<xarray.Dataset> Size: 16MB
Dimensions:     (valid_time: 1464, latitude: 21, longitude: 21)
Coordinates:
  * valid_time  (valid_time) datetime64[ns] 12kB 2024-01-01 ... 2024-12-31T18...
  * latitude    (latitude) float64 168B 5.0 5.25 5.5 5.75 ... 9.25 9.5 9.75 10.0
  * longitude   (longitude) float64 168B 78.0 78.25 78.5 ... 82.5 82.75 83.0
    number      int64 8B 0
    expver      (valid_time) <U4 23kB '0001' '0001' '0001' ... '0001' '0001'
Data variables:
    u10         (valid_time, latitude, longitude) float32 3MB -4.375 ... -6.738
    v10         (valid_time, latitude, longitude) float32 3MB -1.539 ... -6.051
    msl         (valid_time, latitude, longitude) float32 3MB 1.011e+05 ... 1...
    shts        (valid_time, latitude, longitude) float32 3MB 1.488 ... 1.261
    mpts        (valid_time, latitude, longitude) float32 3MB 7.378 ... 7.853
    mdts        (valid_time, latitude, longitude) float32 3MB 79.67 ... 120.2
Attributes:
    GRIB_centre:             ecmf
    GRIB_centre

In [5]:
# Save to final csv
output_filename = f"surf_data_{YEAR}.nc"
ds.to_netcdf(output_filename)