# Meteorological time-series extraction: Part C

Author: Thiago Nascimento (thiago.nascimento@eawag.ch)

This notebook is part of the EStreams publication and was used to extract and aggregate the meteorological time-series from the E-OBS dataset. At Part C we export the preprocessed daily data in the final time-series format. 

* Note that this code enables not only the replicability of the current database but also the extrapolation to new catchment areas. 
* Additionally, the user should download and insert the original raw-data in the folder of the same name prior to run this code. 
* The original third-party data used were not made available in this repository due to redistribution and storage-space reasons.  

## Requirements
**Python:**

* Python>=3.6
* Jupyter
* geopandas=0.10.2
* glob
* netCDF4
* numpy
* os
* pandas
* tqdm

Check the Github repository for an environment.yml (for conda environments) or requirements.txt (pip) file.

**Files:**

* data/meteorology/eobs/preprocessing/{rr, tg, tn, tx, pp, hu, fg, qq, pet, pet_iceland}
* data/shapefiles/estreams_catchments.shp

**Directory:**

* Clone the GitHub directory locally
* ONLY update the "PATH" variable in the section "Configurations", with their relative path to the EStreams directory. 

## References

* Cornes, R., G. van der Schrier, E.J.M. van den Besselaar, and P.D. Jones. 2018: An Ensemble Version of the E-OBS Temperature and Precipitation Datasets, J. Geophys. Res. Atmos., 123. doi:10.1029/2017JD028200

## Licenses
* EOBS: "The ECA&D data policy applies. These observational data are strictly for use in non-commercial research and non-commercial education projects only. Scientific results based on these data must be submitted for publication in the open literature without any delay linked to commercial objectives" https://www.ecad.eu/download/ensembles/download.php#guidance (Last access: 27 November 2023)

## Observations
#### E-OBS filenames

* rr = Total daily precipitation
* tg = Mean daily temperature
* tn = Minimum daily temperature
* tx = Maximum daily temperature
* pp = Mean daily air pressure at sea level
* hu = Mean daily relative humidity
* fg = Mean wind speed at 10-meters
* qq = Total daily global radiation

# Import modules

In [None]:
import os
import numpy as np
import pandas as pd
import geopandas as gpd
import tqdm
import time
import glob

# Configurations

In [None]:
# Only editable variables:
# Relative path to your local directory
PATH = "../../.."

* #### The users should NOT change anything in the code below here.

In [None]:
# Non-editable variables:
PATH_preprocessing = "data/meteorology/eobs/preprocessing/"
PATH_OUTPUT = "results/timeseries/meteorology/catchments"
PATH_OUTPUT_2 = "results/timeseries/meteorology"
PATH_shapefile = "data/shapefiles/estreams_catchments.shp"
variables = ["rr", "tg", "tn", "tx", "pp", "hu", "fg", "qq", "pet", "pet_iceland"] # Eobs variables

# Set the directory:
os.chdir(PATH)

# Import data
## Catchment boundaries

In [None]:
catchment_boundaries = gpd.read_file(PATH_shapefile)
catchment_boundaries.head()

In [None]:
print("The total number of catchments to be processed are:", len(catchment_boundaries))

# Reproject to WGS-84

In [None]:
# Set the CRS of the shapefile's geometry to EPSG:4326 (WGS 84)
catchment_boundaries["geometry"] = catchment_boundaries["geometry"].to_crs(epsg=4326)

# Data organization and export
* #### This part should only be run after all the e-obs variables time-series have already been extracted. 

* Each catchment will be exported as one single CSV-file with their respective nine variables.
* variables = ["rr", "tg", "tn", "tx", "pp", "hu", "fg", "qq", "pet"].
* We divide here the analsysis for first except Iceland, and for only Iceland.

In [None]:
# Here we can check the folders again:
folders = glob.glob(PATH_preprocessing+ "/*")
folders

## No Iceland 

In [None]:
# Select the first two rows from catchment_boundaries
subset_catchment = catchment_boundaries[~catchment_boundaries['basin_id'].str.contains('IS00', case=False)]

catchmentnames = subset_catchment.basin_id.tolist()
len(catchmentnames)

In [None]:
# Now we may organize our data:
variables = ["rr", "tg", "tn", "tx", "pp", "hu", "fg", "qq", "pet"]

# The loop goes over each catchment and variable, and make one export per catchment. At the end we will have 
# 15,047 csv-files each with 9 columns, and one datetype index.
for catchment in tqdm.tqdm(catchmentnames):
    
    timeseries_variables = pd.DataFrame(index=pd.date_range('01-01-1950', '06-30-2023', freq='D'), columns = ["rr", "tg", "tn", "tx", "pp", "hu", "fg", "qq", "pet"])
    
    for variable in variables:
        
        # Wind speed (fg) has its date range from 1980-2023, while all other variables from 1950-2023. 
        # Therefore we apoly a if to deal with the situation: 
        if variable == "fg":
            timeseries_catchment = pd.read_csv('data/meteorology/eobs/preprocessing/'+variable+"/"+variable+"_"+catchment+".csv", 
                                               usecols=[0], header=None, names=["weighted"])
            timeseries_catchment.index = pd.date_range('01-01-1980', '06-30-2023', freq='D')
            timeseries_variables[variable] = timeseries_catchment
            
        else:
            timeseries_catchment = pd.read_csv('data/meteorology/eobs/preprocessing/'+variable+"/"+variable+"_"+catchment+".csv", 
                                               usecols=[0], header=None, names=["weighted"])
            timeseries_catchment.index = pd.date_range('01-01-1950', '06-30-2023', freq='D')
            timeseries_variables[variable] = timeseries_catchment
    
    # Here we rename our columns:
    timeseries_variables.columns = ["p_mean", "t_mean", "t_min", "t_max", "sp_mean", "rh_mean", "ws_mean","swr_mean", "pet_mean"]
    timeseries_variables = timeseries_variables.round(2)
    timeseries_variables.index.name = "date"
    timeseries_variables.to_csv(PATH_OUTPUT + "/estreams_meteorology_"+catchment+".csv")



## Iceland 

In [None]:
# Select the first two rows from catchment_boundaries
subset_catchment = catchment_boundaries[catchment_boundaries['basin_id'].str.contains('IS00', case=False)]

catchmentnames = subset_catchment.basin_id.tolist()
len(catchmentnames)

In [None]:
# Now we may organize our data:
variables = ["rr", "tg", "tn", "tx", "pp", "hu", "fg", "qq", "pet_iceland"]

# The loop goes over each catchment and variable, and make one export per catchment. At the end we will have 
# 15,047 csv-files each with 9 columns, and one datetype index.
for catchment in tqdm.tqdm(catchmentnames):
    
    timeseries_variables = pd.DataFrame(index=pd.date_range('01-01-1950', '06-30-2023', freq='D'), columns = ["rr", "tg", "tn", "tx", "pp", "hu", "fg", "qq", "pet_iceland"])
    
    for variable in variables:
        
        # It may be the case that there is no csv-file for the respective catchment (areas not covered by E-OBS)
        # Therefore the try except can deal with this situation
        
        # Wind speed (fg) has its date range from 1980-2023, while all other variables from 1950-2023. 
        # Therefore we apoly a if to deal with the situation: 
        if variable == "fg":
            timeseries_catchment = pd.read_csv('data/meteorology/eobs/preprocessing/'+variable+"/"+variable+"_"+catchment+".csv", 
                                               usecols=[0], header=None, names=["weighted"])
            timeseries_catchment.index = pd.date_range('01-01-1980', '06-30-2023', freq='D')
            timeseries_variables[variable] = timeseries_catchment
            
        else:
            timeseries_catchment = pd.read_csv('data/meteorology/eobs/preprocessing/'+variable+"/"+variable+"_"+catchment+".csv", 
                                               usecols=[0], header=None, names=["weighted"])
            timeseries_catchment.index = pd.date_range('01-01-1950', '06-30-2023', freq='D')
            timeseries_variables[variable] = timeseries_catchment
    
    # Here we rename our columns:
    timeseries_variables.columns = ["p_mean", "t_mean", "t_min", "t_max", "sp_min", "rh_mean", "ws_mean","swr_mean", "pet_mean"]
    timeseries_variables = timeseries_variables.round(2)
    timeseries_variables.index.name = "date"
    timeseries_variables.to_csv(PATH_OUTPUT + "/estreams_meteorology_"+catchment+".csv")

# CSV-files for hydro-climatic signatures and indexes  
Here instead of individually, we export these variables in three csv-files because this format will be usefull for the streamflow signatures computation. We only export these three files because they are the only used for the signatures and indexes. 

* estreams_meteorology_precipitation.csv
* estreams_meteorology_temperature.csv
* estreams_meteorology_pet.csv

### Precipitation

In [None]:
filenames = glob.glob(PATH_preprocessing+"rr/" +  "/*.csv")
len(filenames)

timeseries_p = pd.DataFrame(index=pd.date_range('01-01-1950', '06-30-2023', freq='D'))

for filename in tqdm.tqdm(filenames):

    catchmentname = os.path.basename(filename)
    catchmentname = catchmentname.split("_", 1)[1]
    catchmentname = catchmentname.replace(".csv", "")

    timeseries_catchment = pd.read_csv(filename, usecols=[0], header=None, names=["weighted"])
    timeseries_catchment.index = pd.date_range('01-01-1950', '06-30-2023', freq='D')

    catchment_values = timeseries_catchment["weighted"].values
    
    timeseries_p[str(catchmentname)+"a"] = catchment_values

# The only way this code worked was after the addition of "a", then now we must delete it:    
timeseries_p.columns = timeseries_p.columns.str.replace("a", "")
timeseries_p = timeseries_p.round(2)
timeseries_p = timeseries_p.sort_index(axis=1)

# Save the data:
# Export the final dataset:
timeseries_p.to_csv(PATH_OUTPUT_2 + "/estreams_meteorology_precipitation.csv")

# Check it out:
timeseries_p

### Temperature

In [None]:
filenames = glob.glob(PATH_preprocessing+"tg/" +  "/*.csv")
len(filenames)

timeseries_tmean = pd.DataFrame(index=pd.date_range('01-01-1950', '06-30-2023', freq='D'))

for filename in tqdm.tqdm(filenames):

    catchmentname = os.path.basename(filename)
    catchmentname = catchmentname.split("_", 1)[1]
    catchmentname = catchmentname.replace(".csv", "")

    timeseries_catchment = pd.read_csv(filename, usecols=[0], header=None, names=["weighted"])
    timeseries_catchment.index = pd.date_range('01-01-1950', '06-30-2023', freq='D')

    catchment_values = timeseries_catchment["weighted"].values
    
    timeseries_tmean[str(catchmentname)+"a"] = catchment_values

# The only way this code worked was after the addition of "a", then now we must delete it:    
timeseries_tmean.columns = timeseries_tmean.columns.str.replace("a", "")
timeseries_tmean = timeseries_tmean.round(2)
timeseries_tmean = timeseries_tmean.sort_index(axis=1)

# Save the data:
# Export the final dataset:
timeseries_tmean.to_csv(PATH_OUTPUT_2 + "/estreams_meteorology_temperature.csv")

# Check it out:
timeseries_tmean

### Potential evapotranspiration

In [None]:
# For the general catchments:
filenames = glob.glob(PATH_preprocessing+"pet/" +  "/*.csv")

timeseries_pet = pd.DataFrame(index=pd.date_range('01-01-1950', '06-30-2023', freq='D'))

for filename in tqdm.tqdm(filenames):

    catchmentname = os.path.basename(filename)
    catchmentname = catchmentname.split("_", 1)[1]
    catchmentname = catchmentname.replace(".csv", "")

    timeseries_catchment = pd.read_csv(filename, usecols=[0], header=None, names=["weighted"])
    timeseries_catchment.index = pd.date_range('01-01-1950', '06-30-2023', freq='D')

    catchment_values = timeseries_catchment["weighted"].values
    
    timeseries_pet[str(catchmentname)+"a"] = catchment_values

# For the catchments in Iceland:
filenames_iceland = glob.glob(PATH_preprocessing+"pet_iceland/" +  "/*.csv")

timeseries_pet_iceland = pd.DataFrame(index=pd.date_range('01-01-1950', '06-30-2023', freq='D'))

for filename in tqdm.tqdm(filenames_iceland):

    catchmentname = os.path.basename(filename)
    catchmentname = catchmentname.split("_", 2)[2]
    catchmentname = catchmentname.replace(".csv", "")

    timeseries_catchment = pd.read_csv(filename, usecols=[0], header=None, names=["weighted"])
    timeseries_catchment.index = pd.date_range('01-01-1950', '06-30-2023', freq='D')

    catchment_values = timeseries_catchment["weighted"].values
    
    timeseries_pet_iceland[str(catchmentname)+"a"] = catchment_values


timeseries_pet = pd.concat([timeseries_pet, timeseries_pet_iceland], axis=1)
    
# The only way this code worked was after the addition of "a", then now we must delete it:    
timeseries_pet.columns = timeseries_pet.columns.str.replace("a", "")
timeseries_pet = timeseries_pet.round(2)
timeseries_pet = timeseries_pet.sort_index(axis=1)

# Save the data:
# Export the final dataset:
timeseries_pet.to_csv(PATH_OUTPUT_2 + "/estreams_meteorology_pet.csv")

# Check it out:
timeseries_pet

# End