# Meteorological stations coverage extraction


Author: Thiago Nascimento (thiago.nascimento@eawag.ch)

This notebook is part of the EStreams publication and was used to extract and aggregate the weather stations coverage information from the E-OBS dataset.

* Note that this code enables not only the replicability of the current database but also the extrapolation to new catchment areas. 
* Additionally, the user should download and insert the original raw-data in the folder of the same name prior to run this code. 
* The original third-party data used were not made avaialable in this repository due to redistribution and storage-space reasons.  

## Requirements
**Python:**

* Python>=3.6
* Jupyter
* geopandas=0.10.2
* numpy
* os
* pandas
* shapely
* tqdm

Check the Github repository for an environment.yml (for conda environments) or requirements.txt (pip) file.

**Files:**

* data/shapefiles/estreams_catchments.shp
* data/eobs_stations/stations_info_{rr, tg, tn, tx, pp, hu, fg, qq}_v28.0e.txt. https://www.ecad.eu/download/ensembles/download.php (Last access: 27 November 2023)

**Directory:**

* Clone the GitHub directory locally
* Place any third-data variables in their respective directory.
* ONLY update the "PATH" variable in the section "Configurations", with their relative path to the EStreams directory. 

## References
* Cornes, R., G. van der Schrier, E.J.M. van den Besselaar, and P.D. Jones. 2018: An Ensemble Version of the E-OBS Temperature and Precipitation Datasets, J. Geophys. Res. Atmos., 123. doi:10.1029/2017JD028200

## Licenses
* EOBS: "The ECA&D data policy applies. These observational data are strictly for use in non-commercial research and non-commercial education projects only. Scientific results based on these data must be submitted for publication in the open literature without any delay linked to commercial objectives" https://www.ecad.eu/download/ensembles/download.php#guidance (Last access: 27 November 2023)

# Import modules

In [None]:
import geopandas as gpd
import os
import pandas as pd
import numpy as np
from shapely.geometry import Point, Polygon
import tqdm as tqdm
from utils.hydrology import count_geometries_in_polygons

# Configurations

In [None]:
# Only editable variables:
# Relative path to your local directory
PATH = "../../.."
# Do not change the order of the variables:
filenames = ['data/eobs_stations/stations_info_qq_v28.0e.txt', 'data/eobs_stations/stations_info_pp_v28.0e.txt',
             'data/eobs_stations/stations_info_tg_v28.0e.txt','data/eobs_stations/stations_info_tx_v28.0e.txt',
             'data/eobs_stations/stations_info_fg_v28.0e.txt','data/eobs_stations/stations_info_rr_v28.0e.txt',
             'data/eobs_stations/stations_info_tn_v28.0e.txt','data/eobs_stations/stations_info_hu_v28.0e.txt']

* #### The users should NOT change anything in the code below here.


In [None]:
# Non-editable variables:
PATH_OUTPUT = "results/staticattributes/"
# Set the directory:
os.chdir(PATH)

# Import data
## Catchment boundaries

In [None]:
catchment_boundaries = gpd.read_file('data/shapefiles/estreams_catchments.shp')
catchment_boundaries

In [None]:
print("The total number of catchments to be processed are:", len(catchment_boundaries))

In [None]:
# Here you can check the crs of the datasets:
print("CRS of catchment_boundaries:", catchment_boundaries.crs)

In [None]:
# Define the target CRS to ETRS89 LAEA (3035)
target_crs = 'EPSG:3035'  

# Reproject the GeoDataFrame to the target CRS
catchment_boundaries_reprojected = catchment_boundaries.to_crs(target_crs)

### E-OBS stations

In [None]:
# Here we analyse the precipitation gauges:
filename = filenames[5]
filename

In [None]:
# Use read_csv with the '|' delimiter
eobs_stations = pd.read_csv('data/eobs_stations/stations_info_rr_v28.0e.txt', delimiter='|', encoding='latin1')
eobs_stations.columns = ['STATION', 'NAME','COUNTRY', 'LAT', 'LON', 'ELEV',
       'START', 'STOP']
eobs_stations

In [None]:
# Create your dataframes
eobs_stations_qq = pd.DataFrame()
eobs_stations_pp = pd.DataFrame()
eobs_stations_tg = pd.DataFrame()
eobs_stations_tx = pd.DataFrame()
eobs_stations_fg = pd.DataFrame()
eobs_stations_rr = pd.DataFrame()
eobs_stations_tn = pd.DataFrame()
eobs_stations_hu = pd.DataFrame()

# Store dataframes in a dictionary
dataframes = {
    'qq': eobs_stations_qq,
    'pp': eobs_stations_pp,
    'tg': eobs_stations_tg,
    'tx': eobs_stations_tx,
    'fg': eobs_stations_fg,
    'rr': eobs_stations_rr,
    'tn': eobs_stations_tn,
    'hu': eobs_stations_hu
}

selected_variables = ['qq', 'pp', 'tg', 'tx', 'fg', 'rr', 'tn', 'hu']

i = 0

for filename in filenames:
    # Use read_csv with the '|' delimiter
    eobs_stations = pd.read_csv(filename, delimiter='|', encoding='latin1')
    eobs_stations.columns = ['STATION', 'NAME','COUNTRY', 'LAT', 'LON', 'ELEV',
       'START', 'STOP']
    
    dataframes[selected_variables[i]] = eobs_stations
    
    i = i + 1

In [None]:
dataframes[selected_variables[5]]

In [None]:
# Here we convert the dataframes to geodataframes and set the projected system
# We need to reproject our geodataframes to a projected coordinate system (in meters) in order to 
# provide the buffer correctly.

for variable in selected_variables:
    # Convert the DataFrame to a GeoDataFrame
    geometry = [Point(lon, lat) for lon, lat in zip(dataframes[variable]['LON'], dataframes[variable]['LAT'])]
    dataframes[variable] = gpd.GeoDataFrame(dataframes[variable], geometry=geometry)

    # Set the coordinate reference system (CRS) for WGS-84
    dataframes[variable].crs = 'EPSG:4326'
    
    # Define the target CRS to ETRS89 LAEA (3035)
    target_crs = 'EPSG:3035'  

    # Reproject the GeoDataFrame to the target CRS
    dataframes[variable] = dataframes[variable].to_crs(target_crs)

## Buffer of the catchments boundaries
* This may take several minutes to run

In [None]:
subset_catchment = catchment_boundaries_reprojected.copy()

# First we make a buffer of 10 km around the catchment shapefiles 
buffer_distance = 10000
buffered_catchment_boundaries_reprojected = subset_catchment.copy()
buffered_catchment_boundaries_reprojected['geometry'] = subset_catchment['geometry'].buffer(buffer_distance)

## Computation

In [None]:
selected_variables = ['qq', 'pp', 'tg', 'tx', 'fg', 'rr', 'tn', 'hu']

# First we create an empty dataframe:
num_stations = pd.DataFrame()
num_stations["area"] = buffered_catchment_boundaries_reprojected.set_index("basin_id", inplace = False).area_calc

for variable in tqdm.tqdm(selected_variables):
    
    # Here we use utils.hydrology.count_geometries_in_polygons function
    num_stations["stations_num_"+variable] = count_geometries_in_polygons(dataframes[variable], 
                                                                 buffered_catchment_boundaries_reprojected, "basin_id", 
                                                                 new_column="num")
    
    num_stations["stations_dens_"+variable] = num_stations["stations_num_"+variable] / num_stations["area"]

In [None]:
num_stations

In [None]:
num_stations_coverage = num_stations.iloc[:, 1:]
num_stations_coverage

In [None]:
num_stations_coverage.columns

In [None]:
# List of values for replacement
old_values = ['_qq', '_pp', '_tg', '_tx', '_fg', '_rr', '_tn', '_hu']
new_values = ['_swr_mean', '_sp_mean', '_t_mean', '_t_max', '_ws_mean', '_p_mean', '_t_min', '_rh_mean']

# Create a mapping dictionary
column_name_mapping = {old: new for old, new in zip(old_values, new_values)}

# Replace the specified patterns in column names
num_stations_coverage.columns = num_stations_coverage.columns.to_series().replace(column_name_mapping, regex=True).values
num_stations_coverage

In [None]:
# Here we sort the index:
num_stations_coverage = num_stations_coverage.sort_index(axis=0)
num_stations_coverage

In [None]:
# Round the data to 3 decimals
num_stations_coverage = num_stations_coverage.astype(float).round(3)
num_stations_coverage

## Data export

In [None]:
# Export the final analysis:
num_stations_coverage.to_csv(PATH_OUTPUT+"/estreams_meteorology_density.csv")

# End