# Hydrological attributes extraction


Author: Thiago Nascimento (thiago.nascimento@eawag.ch)

This notebook is part of the EStreams publication and was used to extract and aggregate the hydrological attributes from the GeoDAR and HydroLakes database. 

* Note that this code enables not only the replicability of the current database but also the extrapolation to new catchment areas. 
* Additionally, the user should download and insert the original raw-data in the folder of the same name prior to run this code. 
* The original third-party data used were not made available in this repository due to redistribution and storage-space reasons.  

## Requirements
**Python:**

* Python>=3.6
* Jupyter
* geopandas=0.10.2
* pandas
* numpy
* tqdm
* os
* osgeo

Check the Github repository for an environment.yml (for conda environments) or requirements.txt (pip) file.

**Files:**

* data/hidrology/GeoDAR_v11_dams.shp. Available at: https://doi.org/10.5281/zenodo.6163413 (Last access: 23 November 2023)
* data/hidrology/GeoDAR_v11_reservoirs.shp. Available at: https://doi.org/10.5281/zenodo.6163413 (Last access: 23 November 2023)
* data/hidrology/GRanD_v13_issues.csv. Available at: https://doi.org/10.5281/zenodo.6163413 (Last access: 23 November 2023)
* data/hidrology/HydroLAKES_polys_v10.shp. Available at: https://www.hydrosheds.org/products/hydrolakes (Last access 05 December 2023)
* data/shapefiles/estreams_boundaries.shp

**Directory:**

* Clone the GitHub directory locally
* Place any third-data variables in their respective directory.
* ONLY update the "PATH" variable in the section "Configurations", with their relative path to the EStreams directory. 

## References

* GeoDAR Dataset:  Jida Wang, Blake A. Walter, Fangfang Yao, Chunqiao Song, Meng Ding, Abu S. Maroof, Jingying Zhu, Chenyu Fan, Jordan M. McAlister, Md Safat Sikder, Yongwei Sheng, George H. Allen, Jean-François Crétaux, & Yoshihide Wada. (2022). GeoDAR: Georeferenced global Dams And Reservoirs dataset for bridging attributes and geolocations [Data set]. In Earth System Science Data (v1.1; v1.0, Vol. 14, Number 4, pp. 1869–1899). Zenodo. https://doi.org/10.5281/zenodo.6163413 (Last access: 23 November 2023)

* Wang, J. et al. GeoDAR: georeferenced global dams and reservoirs dataset for bridging attributes and geolocations. Earth Syst Sci Data 14, 1869–1899 (2022).

* Messager, M.L., Lehner, B., Grill, G., Nedeva, I., Schmitt, O. (2016). Estimating the volume and age of water stored in global lakes using a geo-statistical approach. Nature Communications, 7: 13603. https://doi.org/10.1038/ncomms13603

## Licenses
* GeoDAR: CC BY 4.0. https://doi.org/10.5281/zenodo.6163413 (Last access: 27 November 2023)
* HydroLAKES: CC BY 4.0. https://www.hydrosheds.org/products/hydrolakes (Last access: 27 November 2023)

# Import modules

In [None]:
import geopandas as gpd
import pandas as pd
import numpy as np
import tqdm as tqdm
import os
from utils.hydrology import count_geometries_in_polygons
from osgeo import gdal

# Configurations

In [None]:
# Only editable variables:
# Relative path to your local directory
PATH = "../../.."

* #### The users should NOT change anything in the code below here.


In [None]:
# Non-editable variables:
PATH_OUTPUT = "results/staticattributes/"
# Set SHAPE_RESTORE_SHX config option to avoid problems when SHX is missing.
gdal.SetConfigOption('SHAPE_RESTORE_SHX', 'YES')
# Set the directory:
os.chdir(PATH)

# Import data
## Catchment boundaries

In [None]:
catchment_boundaries = gpd.read_file('data/shapefiles/Catchment_Boundaries_HUGR_33new.shp')
catchment_boundaries

In [None]:
print("The total number of catchments to be processed are:", len(catchment_boundaries))

## GeoDAR data

In [None]:
# Dams and reservois shapefiles:
GeoDAR_v11_dams = gpd.read_file('data/hydrology/GeoDAR_v11_dams.shp')
GeoDAR_v11_dams.replace(-999.0, np.nan, inplace = True)

GeoDAR_v11_reservoirs = gpd.read_file('data/hydrology/GeoDAR_v11_reservoirs.shp')
GeoDAR_v11_reservoirs.replace(-999.0, np.nan, inplace = True)

In [None]:
# GRanD file with extra information:
GRanD_v13_issues = pd.read_csv('data/hydrology/GRanD_v13_issues.csv', index_col=0)
GRanD_v13_issues

## HydroLAKES data

In [None]:
# HydroLAKES shapefiles:
hydroLAKES = gpd.read_file('data/hydrology/HydroLAKES_polys_v10.shp')
hydroLAKES.replace(-9999.0, np.nan, inplace = True)

hydroLAKES

## Concatenate information from GRanD_v13_issues.csv

In [None]:
# Here we create an auxiliar dataframe to help with the concatenation:
GeoDAR_v11_dams_aux = GeoDAR_v11_dams.loc[:, ["id_grd_v13", "id_v11"]].copy()
GeoDAR_v11_dams_aux = GeoDAR_v11_dams_aux[GeoDAR_v11_dams_aux.id_grd_v13>0] #Delete the -999 values
GeoDAR_v11_dams_aux.set_index("id_grd_v13", inplace = True)

# Here we retrieve the year of construction of the dam:
GeoDAR_v11_dams_aux["YEAR"] = GRanD_v13_issues.YEAR

# Now we set the id_v11 as index:
GeoDAR_v11_dams_aux.set_index("id_v11", inplace = True)

#Here we assign the YEAR value when avaialble:
GeoDAR_v11_dams.set_index("id_v11", inplace = True)
GeoDAR_v11_dams["YEAR"] = GeoDAR_v11_dams_aux["YEAR"]

## Reproject the coordinates system

In [None]:
# Here you can check the crs of the datasets:
print("CRS of catchment_boundaries:", catchment_boundaries.crs)
print("CRS of GeoDAR_v11_dams:", GeoDAR_v11_dams.crs)
print("CRS of GeoDAR_v11_reservoirs:", GeoDAR_v11_reservoirs.crs)
print("CRS of hydroLAKES:", hydroLAKES.crs)

In [None]:
# Define the target CRS to ETRS89 LAEA (3035)
target_crs = 'EPSG:4326'  

# Reproject the GeoDataFrame to the target CRS
GeoDAR_v11_reservoirs_reprojected = GeoDAR_v11_reservoirs.to_crs(target_crs)

In [None]:
# Here you can check the crs of the datasets:
print("CRS of GeoDAR_v11_reservoirs_reprojected:", GeoDAR_v11_reservoirs_reprojected.crs)

# Intersection areas

## Number of dams and reservoirs

In [None]:
# First we create an empty dataframe:
hydrology_df = pd.DataFrame()

# Here we use utils.count_geometries_in_polygons function
hydrology_df["dam_num"] = count_geometries_in_polygons(GeoDAR_v11_dams, catchment_boundaries, "basin_id", new_column="dam_num")
hydrology_df["res_num"] = count_geometries_in_polygons(GeoDAR_v11_reservoirs_reprojected, catchment_boundaries, "basin_id", new_column="res_num")

In [None]:
hydrology_df

## Max and min year of dam's construction

In [None]:
data_merged = gpd.sjoin(GeoDAR_v11_dams, catchment_boundaries, how="inner", op='within')

hydrology_df["dam_yr_first"] = data_merged.loc[:, ["basin_id", "YEAR"]].groupby('basin_id').agg('min').copy()
hydrology_df["dam_yr_last"] = data_merged.loc[:, ["basin_id", "YEAR"]].groupby('basin_id').agg('max').copy()

## Reservoir maximum capacity (total)

In [None]:
data_merged = gpd.sjoin(GeoDAR_v11_reservoirs_reprojected, catchment_boundaries, how="inner", op='within')

hydrology_df["res_tot_sto"] = data_merged.loc[:, ["basin_id", "rv_mcm_v11"]].groupby('basin_id').agg('sum').copy()

# Here we correct the res_tot_sto to be set as nan when no information is avaialble and not 0
hydrology_df.loc[:, "res_tot_sto"].replace(0, np.nan, inplace = True) 

## Number of lakes

In [None]:
hydrology_df["lakes_num"] = count_geometries_in_polygons(hydroLAKES, catchment_boundaries, "basin_id", new_column="lakes_num")

## Total upstream lakes' area and volume

In [None]:
data_merged = gpd.sjoin(hydroLAKES, catchment_boundaries, how="inner", op='within')

hydrology_df["lakes_tot_area"] = data_merged.loc[:, ["basin_id", "Lake_area"]].groupby('basin_id').agg('sum').copy()
hydrology_df["lakes_tot_vol"] = data_merged.loc[:, ["basin_id", "Vol_total"]].groupby('basin_id').agg('sum').copy()

## Check the output

In [None]:
hydrology_df

In [None]:
# Here we sort the index:
hydrology_df = hydrology_df.sort_index(axis=0)
hydrology_df

In [None]:
# Round the data to 3 decimals
hydrology_df = hydrology_df.astype(float).round(3)
hydrology_df

# Data export

In [None]:
# Export the final dataset:
hydrology_df.to_csv(PATH_OUTPUT+"estreams_hydrology_attributes.csv")

# End