### ERA5 Dataset 
The server splits the data you requested into separate files based on the type of variable and the model stream it comes from.
Your request included variables from different categories, which the system packages separately. Here’s the breakdown:

data_stream-oper_stepType-instant.nc (Atmospheric Instantaneous) : This file contains variables that represent a snapshot in time.
Variables: 10m_u_component_of_wind, 10m_v_component_of_wind, mean_sea_level_pressure, sea_surface_temperature, total_cloud_cover.

data_stream-oper_stepType-accum.nc (Atmospheric Accumulated) : This file contains variables that are accumulated or averaged over a time period. total_precipitation isn't an instantaneous value; it's the total rain that fell in the hours leading up to the timestamp (e.g., total rainfall between 00:00 and 06:00).
Variables: total_precipitation.

data_stream-wave_stepType-instant.nc (Wave Model Instantaneous) : This file contains variables generated by a separate wave model (WAM), not the primary atmospheric model.
Variables: significant_height_of_combined_wind_waves_and_swell.

In [1]:
import xarray as xr
import zipfile
import os
import glob

YEAR = "2020"

In [2]:
# EXTRACT ALL MONTHLY ARCHIVES
for month_num in range(1, 13):
    month_str = f"{month_num:02d}"
    # Define the path to the monthly archive and the directory to extract into
    extract_path = os.path.join(YEAR, month_str)
    archive_path = os.path.join(extract_path, f"surf_data_{YEAR}_{month_str}.nc")
    try:
        with zipfile.ZipFile(archive_path, 'r') as zip_file:
            zip_file.extractall(path=extract_path)
        print(f"  Extracted: {os.path.basename(archive_path)}")
    except Exception as e:
        print(f"Error extracting {os.path.basename(archive_path)}: {e}")

  Extracted: surf_data_2020_01.nc
  Extracted: surf_data_2020_02.nc
  Extracted: surf_data_2020_03.nc
  Extracted: surf_data_2020_04.nc
  Extracted: surf_data_2020_05.nc
  Extracted: surf_data_2020_06.nc
  Extracted: surf_data_2020_07.nc
  Extracted: surf_data_2020_08.nc
  Extracted: surf_data_2020_09.nc
  Extracted: surf_data_2020_10.nc
  Extracted: surf_data_2020_11.nc
  Extracted: surf_data_2020_12.nc


In [3]:
# MERGE EXTRACTED FILES
list_of_monthly_datasets = []

for month_num in range(1, 13):
    month_str = f"{month_num:02d}"
    month_directory = os.path.join(YEAR, month_str)
    
    print(f"  Processing directory: {month_directory}")

    # Define the expected data_stream files within the directory
    files_to_merge = [
        os.path.join(month_directory, "data_stream-oper_stepType-instant.nc"),
        os.path.join(month_directory, "data_stream-oper_stepType-accum.nc"),
        os.path.join(month_directory, "data_stream-wave_stepType-instant.nc"),
    ]

    try:
        datasets = []
        # Load each data_stream file that exists
        for file_path in files_to_merge:
            if os.path.exists(file_path):
                ds_part = xr.open_dataset(file_path)
                datasets.append(ds_part)
        
        if not datasets:
            print(f"    -> No data_stream files found to merge. Skipping.")
            continue
            
        combined_ds = xr.merge(datasets, compat='override')
        cleaned_ds = combined_ds.dropna(dim="valid_time", how="all")
        list_of_monthly_datasets.append(cleaned_ds)

    except Exception as e:
        print(f"An error occurred: {e}")

  Processing directory: 2020\01
  Processing directory: 2020\02
  Processing directory: 2020\03
  Processing directory: 2020\04
  Processing directory: 2020\05
  Processing directory: 2020\06
  Processing directory: 2020\07
  Processing directory: 2020\08
  Processing directory: 2020\09
  Processing directory: 2020\10
  Processing directory: 2020\11
  Processing directory: 2020\12


In [4]:
try:
    # Concatenate all the monthly datasets into one for the entire year
    ds = xr.concat(list_of_monthly_datasets, dim="valid_time")
    print(ds)
except Exception as e:
    print(e)

<xarray.Dataset> Size: 5MB
Dimensions:     (valid_time: 1464, latitude: 11, longitude: 11)
Coordinates:
    number      int64 8B 0
  * valid_time  (valid_time) datetime64[ns] 12kB 2020-01-01 ... 2020-12-31T18...
  * latitude    (latitude) float64 88B 10.0 9.5 9.0 8.5 8.0 ... 6.5 6.0 5.5 5.0
  * longitude   (longitude) float64 88B 78.0 78.5 79.0 79.5 ... 82.0 82.5 83.0
    expver      (valid_time) <U4 23kB '0001' '0001' '0001' ... '0001' '0001'
Data variables:
    u10         (valid_time, latitude, longitude) float32 709kB 0.3915 ... -4...
    v10         (valid_time, latitude, longitude) float32 709kB -1.084 ... -2...
    msl         (valid_time, latitude, longitude) float32 709kB 1.013e+05 ......
    sst         (valid_time, latitude, longitude) float32 709kB nan ... 302.0
    tcc         (valid_time, latitude, longitude) float32 709kB 0.9592 ... 0....
    tp          (valid_time, latitude, longitude) float32 709kB 1.907e-06 ......
    swh         (valid_time, latitude, longitude) flo

In [9]:
# Save to final csv
output_filename = f"surf_data_{YEAR}.nc"
ds.to_netcdf(output_filename)