# Hydrological attributes extraction


Author: Thiago Nascimento (thiago.nascimento@eawag.ch)

This notebook is part of the EStreams publication and was used to extract and aggregate the hydrological attributes from the GeoDAR and HydroLakes database. 

* Note that this code enables not only the replicability of the current database but also the extrapolation to new catchment areas. 
* Additionally, the user should download and insert the original raw-data in the folder of the same name prior to run this code. 
* The original third-party data used were not made available in this repository due to redistribution and storage-space reasons.  

## Requirements
**Python:**

* Python>=3.6
* Jupyter
* geopandas=0.10.2
* pandas
* numpy
* tqdm
* os
* osgeo

Check the Github repository for an environment.yml (for conda environments) or requirements.txt (pip) file.

**Files:**

* data/hidrology/GeoDAR_v11_dams.shp. Available at: https://doi.org/10.5281/zenodo.6163413 (Last access: 23 November 2023)
* data/hidrology/GeoDAR_v11_reservoirs.shp. Available at: https://doi.org/10.5281/zenodo.6163413 (Last access: 23 November 2023)
* data/hidrology/GRanD_v13_issues.csv. Available at: https://doi.org/10.5281/zenodo.6163413 (Last access: 23 November 2023)
* data/hidrology/HydroLAKES_polys_v10.shp. Available at: https://www.hydrosheds.org/products/hydrolakes (Last access 05 December 2023)
* data/shapefiles/estreams_boundaries.shp

**Directory:**

* Clone the GitHub directory locally
* Place any third-data variables in their respective directory.
* ONLY update the "PATH" variable in the section "Configurations", with their relative path to the EStreams directory. 

## References

* GeoDAR Dataset:  Jida Wang, Blake A. Walter, Fangfang Yao, Chunqiao Song, Meng Ding, Abu S. Maroof, Jingying Zhu, Chenyu Fan, Jordan M. McAlister, Md Safat Sikder, Yongwei Sheng, George H. Allen, Jean-François Crétaux, & Yoshihide Wada. (2022). GeoDAR: Georeferenced global Dams And Reservoirs dataset for bridging attributes and geolocations [Data set]. In Earth System Science Data (v1.1; v1.0, Vol. 14, Number 4, pp. 1869–1899). Zenodo. https://doi.org/10.5281/zenodo.6163413 (Last access: 23 November 2023)

* Wang, J. et al. GeoDAR: georeferenced global dams and reservoirs dataset for bridging attributes and geolocations. Earth Syst Sci Data 14, 1869–1899 (2022).

* Messager, M.L., Lehner, B., Grill, G., Nedeva, I., Schmitt, O. (2016). Estimating the volume and age of water stored in global lakes using a geo-statistical approach. Nature Communications, 7: 13603. https://doi.org/10.1038/ncomms13603

## Licenses
* GeoDAR: CC BY 4.0. https://doi.org/10.5281/zenodo.6163413 (Last access: 27 November 2023)
* HydroLAKES: CC BY 4.0. https://www.hydrosheds.org/products/hydrolakes (Last access: 27 November 2023)

# Import modules

In [1]:
import geopandas as gpd
import pandas as pd
import numpy as np
import tqdm as tqdm
import os
from utils.hydrology import count_geometries_in_polygons
from osgeo import gdal

# Configurations

In [2]:
# Only editable variables:
# Relative path to your local directory
PATH = "../../.."

* #### The users should NOT change anything in the code below here.


In [3]:
# Non-editable variables:
PATH_OUTPUT = "results/staticattributes/"
# Set SHAPE_RESTORE_SHX config option to avoid problems when SHX is missing.
gdal.SetConfigOption('SHAPE_RESTORE_SHX', 'YES')
# Set the directory:
os.chdir(PATH)

# Import data
## Catchment boundaries

In [5]:
catchment_boundaries = gpd.read_file('data/shapefiles/estreams_catchments.shp')
catchment_boundaries

Unnamed: 0,id,area_km2,outlet_lat,outlet_lng,name,area_offic,layer,path,Code,basin_id,area_calc,geometry
0,FR003159,37,47.488,7.393,A100003001,38.6,FR003159,C:/Users/nascimth/Documents/Thiago/Eawag/Pytho...,FR003159,FR003159,37.183,"POLYGON ((7.30374 47.49375, 7.30708 47.49375, ..."
1,FR003160,227,47.626,7.239,A105003001,233.0,FR003160,C:/Users/nascimth/Documents/Thiago/Eawag/Pytho...,FR003160,FR003160,226.962,"POLYGON ((7.22291 47.63458, 7.22374 47.63458, ..."
2,FR003161,14,47.586,7.384,A106000101,15.0,FR003161,C:/Users/nascimth/Documents/Thiago/Eawag/Pytho...,FR003161,FR003161,13.595,"POLYGON ((7.38791 47.59041, 7.39874 47.59041, ..."
3,FR003162,70,47.622,7.275,A107020001,70.0,FR003162,C:/Users/nascimth/Documents/Thiago/Eawag/Pytho...,FR003162,FR003162,70.152,"POLYGON ((7.28375 47.60958, 7.28291 47.60958, ..."
4,FR003163,330,47.653,7.265,A108003001,325.0,FR003163,C:/Users/nascimth/Documents/Thiago/Eawag/Pytho...,FR003163,FR003163,330.158,"POLYGON ((7.22958 47.65291, 7.23208 47.65291, ..."
...,...,...,...,...,...,...,...,...,...,...,...,...
1967,HR000314,135,44.202,16.069,7267,,HR000314,C:/Users/nascimth/Documents/Thiago/Eawag/Pytho...,HR000314,HR000314,135.462,"POLYGON ((16.01458 44.21375, 16.01375 44.21375..."
1968,HR000315,458,44.162,15.858,7236,,HR000315,C:/Users/nascimth/Documents/Thiago/Eawag/Pytho...,HR000315,HR000315,457.864,"POLYGON ((15.89625 44.07791, 15.89374 44.07791..."
1969,HR000316,514,44.162,15.849,7237,,HR000316,C:/Users/nascimth/Documents/Thiago/Eawag/Pytho...,HR000316,HR000316,514.369,"POLYGON ((15.84208 44.15458, 15.84208 44.15458..."
1970,HR000317,185,45.334,14.452,6077,,HR000317,C:/Users/nascimth/Documents/Thiago/Eawag/Pytho...,HR000317,HR000317,184.733,"POLYGON ((14.51875 45.36708, 14.51875 45.36791..."


In [6]:
print("The total number of catchments to be processed are:", len(catchment_boundaries))

The total number of catchments to be processed are: 1972


## GeoDAR data

In [7]:
# Dams and reservois shapefiles:
GeoDAR_v11_dams = gpd.read_file('data/hydrology/GeoDAR_v11_dams.shp')
GeoDAR_v11_dams.replace(-999.0, np.nan, inplace = True)

GeoDAR_v11_reservoirs = gpd.read_file('data/hydrology/GeoDAR_v11_reservoirs.shp')
GeoDAR_v11_reservoirs.replace(-999.0, np.nan, inplace = True)

In [8]:
# GRanD file with extra information:
GRanD_v13_issues = pd.read_csv('data/hydrology/GRanD_v13_issues.csv', index_col=0)
GRanD_v13_issues

Unnamed: 0_level_0,RES_NAME_c,DAM_NAME_c,ALT_NAME_c,RIVER_c,ALT_RIVER_c,MAIN_BASIN,SUB_BASIN,NEAR_CITY_c,ALT_CITY_c,ADMIN_UNIT_c,...,EDITOR,LONG_DD,LAT_DD,POLY_SRC,Issue,Description,Lat_corrected,Lon_corrected,Correction_source,Harmonized
GRAND_ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
4759,,Thakurwadi,,Indrayani,,Krishna-Godavari,,Murbad,Vadegaon,Maharashtra,...,McGill-BL,73.568285,18.865823,SWBD,Attributes mixed with another dam,"Original GRanD location is Thokarwadi Dam, not...",,,NRLD; Google Maps,"Yes, with original coordinates"
4920,Wadgaon Lake,Wadgaon,Vadgaon,,,Krishna-Godavari,,Ashti,,Maharashtra,...,McGill-BL,79.041410,20.824677,SWBD,Misplaced; Attributes mixed up with another dam,Original GRanD location is Lower Wunna Dam (Wa...,18.777926,75.270254,NRLD; Google Maps,"Yes, with corrected coordinates"
274,,Brenda,,Macdonald,,,,Peachland,,British Columbia,...,McGill-BL,-119.965095,49.830734,CanVec,Misplaced,Original GRanD location is actually Peachland ...,,,CanVec; Google Maps,"No, unable to locate"
294,Spectacle Lake,Conconully,Conconully Reservoir,Salmon Creek,Offstream Okanogan River,,,,,Washington,...,UNH,-119.747523,48.537299,SWBD,Misplaced,Original GRanD location is Conconully Dam/Rese...,48.814534,-119.523232,NID; Google Maps,"Yes, with corrected coordinates"
343,,Opportunity Tailings Ponds,,Offstream Silver Bow Creek,,,,,,Montana,...,UNH,-112.782348,46.159241,SWBD,Misplaced,Original GRanD location is Warm Springs Tailin...,46.145259,-112.801936,NID,"Yes, with corrected coordinates"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7316,Sardis Lake,Sardis Lake,Sardis Lake,Jackfork Creek,,Atchafalaya River,,Clayton,,Oklahoma,...,McGill-PB,-95.350711,34.629812,JRC,,,,,,
7317,Tims Ford Lake,Tims Ford,Tims Ford Lake,Elk River,,Mississippi,,Fayetteville,,Tennessee,...,McGill-PB,-86.276254,35.197234,JRC,,,,,,
7318,Cordell Hull Lake,Cordell Hull Dam,Cordell Hull Lake,Cumberland River,,Mississippi,,Carthage,,Tennessee,...,McGill-PB,-85.944131,36.292220,JRC,,,,,,
7319,,Merwin,Ariel Dam,Lewis River,,Columbia,,Woodland,,Washington,...,McGill-PB,-122.555376,45.957492,JRC,,,,,,


## HydroLAKES data

In [9]:
# HydroLAKES shapefiles:
hydroLAKES = gpd.read_file('data/hydrology/HydroLAKES_polys_v10.shp')
hydroLAKES.replace(-9999.0, np.nan, inplace = True)

hydroLAKES

Unnamed: 0,Hylak_id,Lake_name,Country,Continent,Poly_src,Lake_type,Grand_id,Lake_area,Shore_len,Shore_dev,...,Vol_src,Depth_avg,Dis_avg,Res_time,Elevation,Slope_100,Wshd_area,Pour_long,Pour_lat,geometry
0,1,Caspian Sea,Russia,Europe,SWBD,1,0,377001.91,15829.37,7.27,...,1,200.5,8110.642,107883.0,-29,-1.00,1404108.0,47.717708,45.591934,"POLYGON ((49.96181 37.43847, 49.96457 37.44022..."
1,2,Great Bear,Canada,North America,CanVec,1,0,30450.64,5331.72,8.62,...,1,72.2,535.187,47577.7,145,-1.00,147665.4,-123.505546,65.138384,"POLYGON ((-119.78782 67.03574, -119.78637 67.0..."
2,3,Great Slave,Canada,North America,CanVec,1,0,26734.29,9814.16,16.93,...,1,59.1,4350.692,4203.2,148,-1.00,995312.3,-117.617115,61.311727,"POLYGON ((-109.93976 62.95851, -109.93831 62.9..."
3,4,Winnipeg,Canada,North America,CanVec,3,709,23923.04,4018.32,7.33,...,1,11.9,2244.727,1464.3,215,-1.00,919611.5,-97.863542,53.696359,"POLYGON ((-98.80636 53.88021, -98.80578 53.880..."
4,5,Superior,United States of America,North America,SWBD,1,0,81843.92,5248.36,5.18,...,1,146.7,2869.953,48410.3,179,-1.00,209219.5,-84.460547,46.468593,"POLYGON ((-90.72250 46.65740, -90.72458 46.657..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1427683,1427684,,New Zealand,Oceania,SWBD,1,0,0.29,2.30,1.20,...,3,5.1,0.125,139.6,1,4.83,4.4,169.147152,-52.592307,"POLYGON ((169.14763 -52.59291, 169.14750 -52.5..."
1427684,1427685,,Australia,Oceania,SWBD,1,0,0.10,1.66,1.52,...,3,2.1,,,1,1.04,,73.305108,-52.973622,"POLYGON ((73.30605 -52.97318, 73.30540 -52.974..."
1427685,1427686,,Australia,Oceania,SWBD,1,0,0.24,1.92,1.10,...,3,3.7,0.013,792.4,205,2.63,0.2,158.892278,-54.529970,"POLYGON ((158.89124 -54.53117, 158.88696 -54.5..."
1427686,1427687,,Australia,Oceania,SWBD,1,0,0.34,2.95,1.44,...,3,6.5,0.020,1263.9,163,7.58,0.4,158.889583,-54.594300,"POLYGON ((158.88858 -54.59767, 158.88833 -54.5..."


## Concatenate information from GRanD_v13_issues.csv

In [10]:
# Here we create an auxiliar dataframe to help with the concatenation:
GeoDAR_v11_dams_aux = GeoDAR_v11_dams.loc[:, ["id_grd_v13", "id_v11"]].copy()
GeoDAR_v11_dams_aux = GeoDAR_v11_dams_aux[GeoDAR_v11_dams_aux.id_grd_v13>0] #Delete the -999 values
GeoDAR_v11_dams_aux.set_index("id_grd_v13", inplace = True)

# Here we retrieve the year of construction of the dam:
GeoDAR_v11_dams_aux["YEAR"] = GRanD_v13_issues.YEAR

# Now we set the id_v11 as index:
GeoDAR_v11_dams_aux.set_index("id_v11", inplace = True)

#Here we assign the YEAR value when avaialble:
GeoDAR_v11_dams.set_index("id_v11", inplace = True)
GeoDAR_v11_dams["YEAR"] = GeoDAR_v11_dams_aux["YEAR"]

## Reproject the coordinates system

In [11]:
# Here you can check the crs of the datasets:
print("CRS of catchment_boundaries:", catchment_boundaries.crs)
print("CRS of GeoDAR_v11_dams:", GeoDAR_v11_dams.crs)
print("CRS of GeoDAR_v11_reservoirs:", GeoDAR_v11_reservoirs.crs)
print("CRS of hydroLAKES:", hydroLAKES.crs)

CRS of catchment_boundaries: epsg:4326
CRS of GeoDAR_v11_dams: epsg:4326
CRS of GeoDAR_v11_reservoirs: PROJCS["World_Cylindrical_Equal_Area",GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0],UNIT["Degree",0.0174532925199433],AUTHORITY["EPSG","4326"]],PROJECTION["Cylindrical_Equal_Area"],PARAMETER["standard_parallel_1",0],PARAMETER["central_meridian",0],PARAMETER["false_easting",0],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH],AUTHORITY["ESRI","54034"]]
CRS of hydroLAKES: epsg:4326


In [12]:
# Define the target CRS to ETRS89 LAEA (3035)
target_crs = 'EPSG:4326'  

# Reproject the GeoDataFrame to the target CRS
GeoDAR_v11_reservoirs_reprojected = GeoDAR_v11_reservoirs.to_crs(target_crs)

In [13]:
# Here you can check the crs of the datasets:
print("CRS of GeoDAR_v11_reservoirs_reprojected:", GeoDAR_v11_reservoirs_reprojected.crs)

CRS of GeoDAR_v11_reservoirs_reprojected: EPSG:4326


# Intersection areas

## Number of dams and reservoirs

In [14]:
# First we create an empty dataframe:
hydrology_df = pd.DataFrame()

# Here we use utils.count_geometries_in_polygons function
hydrology_df["dam_num"] = count_geometries_in_polygons(GeoDAR_v11_dams, catchment_boundaries, "basin_id", new_column="dam_num")
hydrology_df["res_num"] = count_geometries_in_polygons(GeoDAR_v11_reservoirs_reprojected, catchment_boundaries, "basin_id", new_column="res_num")

In [15]:
hydrology_df

Unnamed: 0_level_0,dam_num,res_num
basin_id,Unnamed: 1_level_1,Unnamed: 2_level_1
FR003159,0.0,0.0
FR003160,0.0,0.0
FR003161,0.0,0.0
FR003162,0.0,0.0
FR003163,0.0,0.0
...,...,...
HR000314,0.0,0.0
HR000315,0.0,0.0
HR000316,0.0,0.0
HR000317,1.0,1.0


## Max and min year of dam's construction

In [16]:
data_merged = gpd.sjoin(GeoDAR_v11_dams, catchment_boundaries, how="inner", op='within')

hydrology_df["dam_yr_first"] = data_merged.loc[:, ["basin_id", "YEAR"]].groupby('basin_id').agg('min').copy()
hydrology_df["dam_yr_last"] = data_merged.loc[:, ["basin_id", "YEAR"]].groupby('basin_id').agg('max').copy()

  if await self.run_code(code, result, async_=asy):


## Reservoir maximum capacity (total)

In [17]:
data_merged = gpd.sjoin(GeoDAR_v11_reservoirs_reprojected, catchment_boundaries, how="inner", op='within')

hydrology_df["res_tot_sto"] = data_merged.loc[:, ["basin_id", "rv_mcm_v11"]].groupby('basin_id').agg('sum').copy()

# Here we correct the res_tot_sto to be set as nan when no information is avaialble and not 0
hydrology_df.loc[:, "res_tot_sto"].replace(0, np.nan, inplace = True) 

  if await self.run_code(code, result, async_=asy):


## Number of lakes

In [18]:
hydrology_df["lakes_num"] = count_geometries_in_polygons(hydroLAKES, catchment_boundaries, "basin_id", new_column="lakes_num")

## Total upstream lakes' area and volume

In [19]:
data_merged = gpd.sjoin(hydroLAKES, catchment_boundaries, how="inner", op='within')

hydrology_df["lakes_tot_area"] = data_merged.loc[:, ["basin_id", "Lake_area"]].groupby('basin_id').agg('sum').copy()
hydrology_df["lakes_tot_vol"] = data_merged.loc[:, ["basin_id", "Vol_total"]].groupby('basin_id').agg('sum').copy()

  if await self.run_code(code, result, async_=asy):


## Check the output

In [20]:
hydrology_df

Unnamed: 0_level_0,dam_num,res_num,dam_yr_first,dam_yr_last,res_tot_sto,lakes_num,lakes_tot_area,lakes_tot_vol
basin_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
FR003159,0.0,0.0,,,,0.0,,
FR003160,0.0,0.0,,,,0.0,,
FR003161,0.0,0.0,,,,0.0,,
FR003162,0.0,0.0,,,,0.0,,
FR003163,0.0,0.0,,,,0.0,,
...,...,...,...,...,...,...,...,...
HR000314,0.0,0.0,,,,0.0,,
HR000315,0.0,0.0,,,,0.0,,
HR000316,0.0,0.0,,,,0.0,,
HR000317,1.0,1.0,,,,0.0,,


In [21]:
# Here we sort the index:
hydrology_df = hydrology_df.sort_index(axis=0)
hydrology_df

Unnamed: 0_level_0,dam_num,res_num,dam_yr_first,dam_yr_last,res_tot_sto,lakes_num,lakes_tot_area,lakes_tot_vol
basin_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
FR003159,0.0,0.0,,,,0.0,,
FR003160,0.0,0.0,,,,0.0,,
FR003161,0.0,0.0,,,,0.0,,
FR003162,0.0,0.0,,,,0.0,,
FR003163,0.0,0.0,,,,0.0,,
...,...,...,...,...,...,...,...,...
HR000313,0.0,0.0,,,,0.0,,
HR000314,0.0,0.0,,,,0.0,,
HR000315,0.0,0.0,,,,0.0,,
HR000316,0.0,0.0,,,,0.0,,


In [22]:
# Round the data to 3 decimals
hydrology_df = hydrology_df.astype(float).round(3)
hydrology_df

Unnamed: 0_level_0,dam_num,res_num,dam_yr_first,dam_yr_last,res_tot_sto,lakes_num,lakes_tot_area,lakes_tot_vol
basin_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
FR003159,0.0,0.0,,,,0.0,,
FR003160,0.0,0.0,,,,0.0,,
FR003161,0.0,0.0,,,,0.0,,
FR003162,0.0,0.0,,,,0.0,,
FR003163,0.0,0.0,,,,0.0,,
...,...,...,...,...,...,...,...,...
HR000313,0.0,0.0,,,,0.0,,
HR000314,0.0,0.0,,,,0.0,,
HR000315,0.0,0.0,,,,0.0,,
HR000316,0.0,0.0,,,,0.0,,


In [None]:
# The last two columns should be filled with 0 instead of NaNs when the lakes num is equal to 0:
hydrology_df.iloc[:, -2:] = hydrology_df.iloc[:, -2:].fillna(0)                                                
hydrology_df

# Data export

In [23]:
# Export the final dataset:
hydrology_df.to_csv(PATH_OUTPUT+"estreams_hydrology_attributes.csv")

# End