# Meteorological time-series extraction: Part C

Author: Thiago Nascimento (thiago.nascimento@eawag.ch)

This notebook is part of the EStreams publication and was used to extract and aggregate the meteorological time-series from the E-OBS dataset. At Part C we export the preprocessed daily data in the final time-series format. 

* Note that this code enables not only the replicability of the current database but also the extrapolation to new catchment areas. 
* Additionally, the user should download and insert the original raw-data in the folder of the same name prior to run this code. 
* The original third-party data used were not made available in this repository due to redistribution and storage-space reasons.  

## Requirements
**Python:**

* Python>=3.6
* Jupyter
* geopandas=0.10.2
* glob
* netCDF4
* numpy
* os
* pandas
* tqdm

Check the Github repository for an environment.yml (for conda environments) or requirements.txt (pip) file.

**Files:**

* data/meteorology/eobs/preprocessing/{rr, tg, tn, tx, pp, hu, fg, qq, pet, pet_iceland}
* data/shapefiles/estreams_catchments.shp

**Directory:**

* Clone the GitHub directory locally
* ONLY update the "PATH" variable in the section "Configurations", with their relative path to the EStreams directory. 

## References

* Cornes, R., G. van der Schrier, E.J.M. van den Besselaar, and P.D. Jones. 2018: An Ensemble Version of the E-OBS Temperature and Precipitation Datasets, J. Geophys. Res. Atmos., 123. doi:10.1029/2017JD028200

## Licenses
* EOBS: "The ECA&D data policy applies. These observational data are strictly for use in non-commercial research and non-commercial education projects only. Scientific results based on these data must be submitted for publication in the open literature without any delay linked to commercial objectives" https://www.ecad.eu/download/ensembles/download.php#guidance (Last access: 27 November 2023)

## Observations
#### E-OBS filenames

* rr = Total daily precipitation
* tg = Mean daily temperature
* tn = Minimum daily temperature
* tx = Maximum daily temperature
* pp = Mean daily air pressure at sea level
* hu = Mean daily relative humidity
* fg = Mean wind speed at 10-meters
* qq = Total daily global radiation

# Import modules

In [1]:
import os
import numpy as np
import pandas as pd
import geopandas as gpd
import tqdm
import time
import glob

# Configurations

In [2]:
# Only editable variables:
# Relative path to your local directory
PATH = "../../.."

* #### The users should NOT change anything in the code below here.

In [3]:
# Non-editable variables:
PATH_preprocessing = "data/meteorology/eobs/preprocessing/"
PATH_OUTPUT = "results/timeseries/meteorology/catchments"
PATH_OUTPUT_2 = "results/timeseries/meteorology"
PATH_shapefile = "data/shapefiles/estreams_catchments.shp"
variables = ["rr", "tg", "tn", "tx", "pp", "hu", "fg", "qq", "pet", "pet_iceland"] # Eobs variables

# Set the directory:
os.chdir(PATH)

# Import data
## Catchment boundaries

In [4]:
catchment_boundaries = gpd.read_file(PATH_shapefile)
catchment_boundaries.head()

Unnamed: 0,id,area_km2,outlet_lat,outlet_lng,name,area_offic,layer,path,area_diff,area_calc,basin_id,geometry
0,HUGR020,9600,46.785,21.142,6444410,9011,HUGR020,C:/Users/nascimth/Documents/Thiago/Eawag/Pytho...,6.536,9595.794,HUGR020,"POLYGON ((21.13208 46.77291, 21.13208 46.77375..."
1,HUGR021,189000,46.423,18.896,6442080,189538,HUGR021,C:/Users/nascimth/Documents/Thiago/Eawag/Pytho...,-0.284,188597.11,HUGR021,"POLYGON ((18.91708 46.41791, 18.91708 46.41625..."
2,HUGR022,28500,48.126,22.34,6444304,29057,HUGR022,C:/Users/nascimth/Documents/Thiago/Eawag/Pytho...,-1.917,28507.473,HUGR022,"POLYGON ((22.32875 48.10875, 22.32791 48.10875..."
3,HUGR023,188000,46.627,18.869,6442060,189092,HUGR023,C:/Users/nascimth/Documents/Thiago/Eawag/Pytho...,-0.577,188286.167,HUGR023,"POLYGON ((18.89041 46.62875, 18.88875 46.62708..."
4,HUGR025,1210,47.662,19.683,6444240,1222,HUGR025,C:/Users/nascimth/Documents/Thiago/Eawag/Pytho...,-0.982,1206.441,HUGR025,"POLYGON ((19.68124 47.66875, 19.68291 47.66875..."


In [5]:
print("The total number of catchments to be processed are:", len(catchment_boundaries))

The total number of catchments to be processed are: 33


# Reproject to WGS-84

In [6]:
# Set the CRS of the shapefile's geometry to EPSG:4326 (WGS 84)
catchment_boundaries["geometry"] = catchment_boundaries["geometry"].to_crs(epsg=4326)

# Data organization and export
* #### This part should only be run after all the e-obs variables time-series have already been extracted. 

* Each catchment will be exported as one single CSV-file with their respective nine variables.
* variables = ["rr", "tg", "tn", "tx", "pp", "hu", "fg", "qq", "pet"].
* We divide here the analsysis for first except Iceland, and for only Iceland.

In [7]:
# Here we can check the folders again:
folders = glob.glob(PATH_preprocessing+ "/*")
folders

['data/meteorology/eobs/preprocessing/pp',
 'data/meteorology/eobs/preprocessing/rr',
 'data/meteorology/eobs/preprocessing/tx',
 'data/meteorology/eobs/preprocessing/pet',
 'data/meteorology/eobs/preprocessing/hu',
 'data/meteorology/eobs/preprocessing/fg',
 'data/meteorology/eobs/preprocessing/pet_iceland',
 'data/meteorology/eobs/preprocessing/tn',
 'data/meteorology/eobs/preprocessing/tg',
 'data/meteorology/eobs/preprocessing/qq']

## No Iceland 

In [8]:
# Select the first two rows from catchment_boundaries
subset_catchment = catchment_boundaries[~catchment_boundaries['basin_id'].str.contains('ISGR', case=False)]

catchmentnames = subset_catchment.basin_id.tolist()
len(catchmentnames)

33

In [9]:
# Now we may organize our data:
variables = ["rr", "tg", "tn", "tx", "pp", "hu", "fg", "qq", "pet"]

# The loop goes over each catchment and variable, and make one export per catchment. At the end we will have 
# 15,047 csv-files each with 9 columns, and one datetype index.
for catchment in tqdm.tqdm(catchmentnames):
    
    timeseries_variables = pd.DataFrame(index=pd.date_range('01-01-1950', '06-30-2023', freq='D'), columns = ["rr", "tg", "tn", "tx", "pp", "hu", "fg", "qq", "pet"])
    
    for variable in variables:
        
        # Wind speed (fg) has its date range from 1980-2023, while all other variables from 1950-2023. 
        # Therefore we apoly a if to deal with the situation: 
        if variable == "fg":
            timeseries_catchment = pd.read_csv('data/meteorology/eobs/preprocessing/'+variable+"/"+variable+"_"+catchment+".csv", 
                                               usecols=[0], header=None, names=["weighted"])
            timeseries_catchment.index = pd.date_range('01-01-1980', '06-30-2023', freq='D')
            timeseries_variables[variable] = timeseries_catchment
            
        else:
            timeseries_catchment = pd.read_csv('data/meteorology/eobs/preprocessing/'+variable+"/"+variable+"_"+catchment+".csv", 
                                               usecols=[0], header=None, names=["weighted"])
            timeseries_catchment.index = pd.date_range('01-01-1950', '06-30-2023', freq='D')
            timeseries_variables[variable] = timeseries_catchment
    
    # Here we rename our columns:
    timeseries_variables.columns = ["p_mean", "t_mean", "t_min", "t_max", "sp_min", "rh_mean", "ws_mean","swr_mean", "pet_mean"]
    timeseries_variables = timeseries_variables.round(2)
    timeseries_variables.index.name = "date"
    timeseries_variables.to_csv(PATH_OUTPUT + "/estreams_meteorology_"+catchment+".csv")



100%|██████████| 33/33 [00:03<00:00,  8.38it/s]


## Iceland 

In [None]:
# Select the first two rows from catchment_boundaries
subset_catchment = catchment_boundaries[catchment_boundaries['basin_id'].str.contains('ISGR', case=False)]

catchmentnames = subset_catchment.basin_id.tolist()
len(catchmentnames)

In [None]:
# Now we may organize our data:
variables = ["rr", "tg", "tn", "tx", "pp", "hu", "fg", "qq", "pet_iceland"]

# The loop goes over each catchment and variable, and make one export per catchment. At the end we will have 
# 15,047 csv-files each with 9 columns, and one datetype index.
for catchment in tqdm.tqdm(catchmentnames):
    
    timeseries_variables = pd.DataFrame(index=pd.date_range('01-01-1950', '06-30-2023', freq='D'), columns = ["rr", "tg", "tn", "tx", "pp", "hu", "fg", "qq", "pet_iceland"])
    
    for variable in variables:
        
        # It may be the case that there is no csv-file for the respective catchment (areas not covered by E-OBS)
        # Therefore the try except can deal with this situation
        
        # Wind speed (fg) has its date range from 1980-2023, while all other variables from 1950-2023. 
        # Therefore we apoly a if to deal with the situation: 
        if variable == "fg":
            timeseries_catchment = pd.read_csv('data/meteorology/eobs/preprocessing/'+variable+"/"+variable+"_"+catchment+".csv", 
                                               usecols=[0], header=None, names=["weighted"])
            timeseries_catchment.index = pd.date_range('01-01-1980', '06-30-2023', freq='D')
            timeseries_variables[variable] = timeseries_catchment
            
        else:
            timeseries_catchment = pd.read_csv('data/meteorology/eobs/preprocessing/'+variable+"/"+variable+"_"+catchment+".csv", 
                                               usecols=[0], header=None, names=["weighted"])
            timeseries_catchment.index = pd.date_range('01-01-1950', '06-30-2023', freq='D')
            timeseries_variables[variable] = timeseries_catchment
    
    # Here we rename our columns:
    timeseries_variables.columns = ["p_mean", "t_mean", "t_min", "t_max", "sp_min", "rh_mean", "ws_mean","swr_mean", "pet_mean"]
    timeseries_variables = timeseries_variables.round(2)
    timeseries_variables.index.name = "date"
    timeseries_variables.to_csv(PATH_OUTPUT + "/estreams_meteorology_"+catchment+".csv")

# CSV-files for hydro-climatic signatures and indexes  
Here instead of individually, we export these variables in three csv-files because this format will be usefull for the streamflow signatures computation. We only export these three files because they are the only used for the signatures and indexes. 

* estreams_meteorology_precipitation.csv
* estreams_meteorology_temperature.csv
* estreams_meteorology_pet.csv

### Precipitation

In [16]:
filenames = glob.glob(PATH_preprocessing+"rr/" +  "/*.csv")
len(filenames)

timeseries_p = pd.DataFrame(index=pd.date_range('01-01-1950', '06-30-2023', freq='D'))

for filename in tqdm.tqdm(filenames):

    catchmentname = os.path.basename(filename)
    catchmentname = catchmentname.split("_", 1)[1]
    catchmentname = catchmentname.replace(".csv", "")

    timeseries_catchment = pd.read_csv(filename, usecols=[0], header=None, names=["weighted"])
    timeseries_catchment.index = pd.date_range('01-01-1950', '06-30-2023', freq='D')

    catchment_values = timeseries_catchment["weighted"].values
    
    timeseries_p[str(catchmentname)+"a"] = catchment_values

# The only way this code worked was after the addition of "a", then now we must delete it:    
timeseries_p.columns = timeseries_p.columns.str.replace("a", "")
timeseries_p = timeseries_p.round(2)
timeseries_p = timeseries_p.sort_index(axis=1)

# Save the data:
# Export the final dataset:
timeseries_p.to_csv(PATH_OUTPUT_2 + "/estreams_meteorology_precipitation.csv")

# Check it out:
timeseries_p

100%|██████████████████████████████████████████| 33/33 [00:00<00:00, 224.92it/s]


Unnamed: 0,HUGR019,HUGR020,HUGR021,HUGR022,HUGR023,HUGR024,HUGR025,HUGR026,HUGR027,HUGR028,...,HUGR042,HUGR043,HUGR044,HUGR045,HUGR046,HUGR047,HUGR048,HUGR049,HUGR050,HUGR051
1950-01-01,0.01,0.00,0.00,0.02,0.00,0.00,0.00,0.00,0.00,0.00,...,0.00,0.00,0.01,0.00,0.00,0.00,0.02,0.00,0.00,0.00
1950-01-02,1.79,0.00,7.83,0.33,7.84,4.79,3.95,0.22,4.79,0.00,...,0.00,0.20,1.80,0.03,0.03,0.00,0.33,0.25,2.94,0.79
1950-01-03,3.88,3.00,4.14,2.02,4.15,5.23,4.20,2.60,6.07,0.28,...,2.71,4.10,3.88,3.66,3.10,3.42,2.35,3.22,5.90,1.98
1950-01-04,3.19,1.95,6.12,3.11,6.13,2.61,0.00,0.00,3.92,1.76,...,2.12,2.58,3.02,2.58,1.73,2.18,3.12,0.01,3.64,6.31
1950-01-05,1.20,0.87,5.11,1.14,5.12,2.27,1.99,0.38,1.76,3.29,...,0.80,1.50,1.22,1.38,0.98,1.02,1.12,1.31,0.49,3.34
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2023-06-26,0.36,0.00,1.04,0.00,1.04,3.16,1.05,0.00,0.60,0.00,...,0.00,0.00,0.35,0.00,0.00,0.00,0.00,0.00,0.02,0.22
2023-06-27,4.47,3.55,2.73,4.29,2.72,6.20,4.82,5.85,5.80,7.29,...,2.73,3.22,4.54,3.77,3.80,3.96,4.23,4.26,3.91,13.29
2023-06-28,2.33,1.32,0.78,2.40,0.77,3.45,2.63,1.40,3.32,1.44,...,1.40,1.26,2.31,1.58,1.34,1.32,2.28,1.19,2.14,0.42
2023-06-29,0.00,0.00,0.68,0.00,0.68,0.00,0.00,0.00,0.00,0.00,...,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00


### Temperature

In [17]:
filenames = glob.glob(PATH_preprocessing+"tg/" +  "/*.csv")
len(filenames)

timeseries_tmean = pd.DataFrame(index=pd.date_range('01-01-1950', '06-30-2023', freq='D'))

for filename in tqdm.tqdm(filenames):

    catchmentname = os.path.basename(filename)
    catchmentname = catchmentname.split("_", 1)[1]
    catchmentname = catchmentname.replace(".csv", "")

    timeseries_catchment = pd.read_csv(filename, usecols=[0], header=None, names=["weighted"])
    timeseries_catchment.index = pd.date_range('01-01-1950', '06-30-2023', freq='D')

    catchment_values = timeseries_catchment["weighted"].values
    
    timeseries_tmean[str(catchmentname)+"a"] = catchment_values

# The only way this code worked was after the addition of "a", then now we must delete it:    
timeseries_tmean.columns = timeseries_tmean.columns.str.replace("a", "")
timeseries_tmean = timeseries_tmean.round(2)
timeseries_tmean = timeseries_tmean.sort_index(axis=1)

# Save the data:
# Export the final dataset:
timeseries_tmean.to_csv(PATH_OUTPUT_2 + "/estreams_meteorology_temperature.csv")

# Check it out:
timeseries_tmean

100%|██████████████████████████████████████████| 33/33 [00:00<00:00, 222.03it/s]


Unnamed: 0,HUGR019,HUGR020,HUGR021,HUGR022,HUGR023,HUGR024,HUGR025,HUGR026,HUGR027,HUGR028,...,HUGR042,HUGR043,HUGR044,HUGR045,HUGR046,HUGR047,HUGR048,HUGR049,HUGR050,HUGR051
1950-01-01,-9.44,-6.83,-5.54,-11.11,-5.54,-7.04,-4.91,-4.80,-8.87,-3.28,...,-6.74,-7.61,-9.25,-8.95,-7.00,-7.29,-10.71,-5.74,-8.92,-5.45
1950-01-02,-5.63,-4.31,-3.12,-5.91,-3.12,-5.13,-3.59,-2.27,-6.02,-3.03,...,-4.52,-4.48,-5.53,-5.31,-4.28,-4.38,-5.78,-3.24,-5.92,-5.47
1950-01-03,-3.11,-1.94,0.64,-4.04,0.64,-1.13,1.10,0.94,-3.26,2.03,...,-2.21,-1.39,-2.92,-3.16,-1.83,-2.07,-3.76,-0.09,-3.35,-0.86
1950-01-04,-4.01,-2.64,-1.09,-4.58,-1.09,-3.12,-0.73,0.17,-4.70,0.80,...,-2.72,-2.97,-3.86,-4.30,-2.67,-2.99,-4.39,-1.34,-4.01,-2.49
1950-01-05,-7.07,-4.53,-1.79,-7.84,-1.79,-5.54,-3.01,-2.65,-7.24,-2.67,...,-4.47,-5.43,-6.90,-6.27,-4.72,-4.85,-7.59,-4.17,-7.39,-3.87
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2023-06-26,19.31,20.51,20.38,18.17,20.38,19.55,20.95,22.21,19.03,21.44,...,20.40,21.19,19.45,19.48,20.58,20.32,18.52,22.00,19.80,18.62
2023-06-27,17.51,20.42,17.93,17.49,17.92,16.85,19.22,21.24,15.42,20.92,...,20.37,20.20,17.63,19.14,20.34,20.19,17.80,21.10,16.78,17.46
2023-06-28,14.83,17.13,15.63,14.27,15.63,14.66,16.74,18.77,13.65,18.59,...,17.00,17.25,14.98,15.53,17.12,16.91,14.57,18.88,14.60,14.35
2023-06-29,15.91,16.71,17.82,14.27,17.81,17.54,18.61,19.04,16.69,19.34,...,16.48,17.18,16.06,15.50,16.74,16.61,14.61,18.12,16.75,15.73


### Potential evapotranspiration

In [18]:
# For the general catchments:
filenames = glob.glob(PATH_preprocessing+"pet/" +  "/*.csv")

timeseries_pet = pd.DataFrame(index=pd.date_range('01-01-1950', '06-30-2023', freq='D'))

for filename in tqdm.tqdm(filenames):

    catchmentname = os.path.basename(filename)
    catchmentname = catchmentname.split("_", 1)[1]
    catchmentname = catchmentname.replace(".csv", "")

    timeseries_catchment = pd.read_csv(filename, usecols=[0], header=None, names=["weighted"])
    timeseries_catchment.index = pd.date_range('01-01-1950', '06-30-2023', freq='D')

    catchment_values = timeseries_catchment["weighted"].values
    
    timeseries_pet[str(catchmentname)+"a"] = catchment_values

# For the catchments in Iceland:
filenames_iceland = glob.glob(PATH_preprocessing+"pet_iceland/" +  "/*.csv")

timeseries_pet_iceland = pd.DataFrame(index=pd.date_range('01-01-1950', '06-30-2023', freq='D'))

for filename in tqdm.tqdm(filenames_iceland):

    catchmentname = os.path.basename(filename)
    catchmentname = catchmentname.split("_", 2)[2]
    catchmentname = catchmentname.replace(".csv", "")

    timeseries_catchment = pd.read_csv(filename, usecols=[0], header=None, names=["weighted"])
    timeseries_catchment.index = pd.date_range('01-01-1950', '06-30-2023', freq='D')

    catchment_values = timeseries_catchment["weighted"].values
    
    timeseries_pet_iceland[str(catchmentname)+"a"] = catchment_values


timeseries_pet = pd.concat([timeseries_pet, timeseries_pet_iceland], axis=1)
    
# The only way this code worked was after the addition of "a", then now we must delete it:    
timeseries_pet.columns = timeseries_pet.columns.str.replace("a", "")
timeseries_pet = timeseries_pet.round(2)
timeseries_pet = timeseries_pet.sort_index(axis=1)

# Save the data:
# Export the final dataset:
timeseries_pet.to_csv(PATH_OUTPUT_2 + "/estreams_meteorology_pet.csv")

# Check it out:
timeseries_pet

100%|██████████████████████████████████████████| 33/33 [00:00<00:00, 221.72it/s]
0it [00:00, ?it/s]


Unnamed: 0,HUGR019,HUGR020,HUGR021,HUGR022,HUGR023,HUGR024,HUGR025,HUGR026,HUGR027,HUGR028,...,HUGR042,HUGR043,HUGR044,HUGR045,HUGR046,HUGR047,HUGR048,HUGR049,HUGR050,HUGR051
1950-01-01,0.15,0.23,0.25,0.12,0.25,0.19,0.21,0.23,0.15,0.31,...,0.24,0.19,0.16,0.17,0.22,0.21,0.13,0.23,0.16,0.30
1950-01-02,0.30,0.39,0.40,0.32,0.40,0.28,0.34,0.40,0.26,0.44,...,0.39,0.36,0.30,0.35,0.38,0.38,0.32,0.40,0.26,0.37
1950-01-03,0.24,0.34,0.33,0.26,0.33,0.25,0.32,0.36,0.19,0.53,...,0.34,0.32,0.25,0.29,0.33,0.33,0.26,0.34,0.20,0.45
1950-01-04,0.27,0.30,0.25,0.26,0.25,0.26,0.31,0.33,0.23,0.43,...,0.31,0.28,0.27,0.25,0.29,0.29,0.26,0.29,0.28,0.34
1950-01-05,0.25,0.33,0.29,0.26,0.29,0.21,0.24,0.27,0.20,0.36,...,0.34,0.29,0.25,0.29,0.32,0.32,0.27,0.28,0.21,0.33
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2023-06-26,5.00,5.22,5.91,4.71,5.91,5.37,5.82,6.04,4.97,5.78,...,5.15,5.37,5.04,4.89,5.25,5.14,4.79,5.67,5.05,5.37
2023-06-27,4.33,5.40,3.98,4.59,3.98,4.00,4.44,5.24,3.65,4.28,...,5.47,4.96,4.35,4.96,5.30,5.28,4.61,5.22,4.03,3.96
2023-06-28,3.73,3.88,4.18,3.49,4.18,3.97,4.28,4.41,3.73,4.18,...,3.84,3.94,3.76,3.61,3.90,3.85,3.53,4.26,3.83,3.72
2023-06-29,4.38,4.97,5.48,4.14,5.48,4.89,5.33,5.42,4.32,5.40,...,4.97,4.87,4.43,4.50,4.94,4.87,4.21,5.23,4.33,5.01


# End