# Terrain attributes extraction

Author: Thiago Nascimento (thiago.nascimento@eawag.ch)

This notebook is part of the EStreams publication and was used to extract and aggregate the terrain characteristics from the MERIT dataset.

* Note that this code enables not only the replicability of the current database but also the extrapolation to new catchment areas. 
* Additionally, the user should download and insert the original raw-data in the folder of the same name prior to run this code. 
* The original third-party data used were not made available in this repository due to redistribution and storage-space reasons.  

## Requirements
**Python:**

* Python>=3.6
* Jupyter
* geopandas=0.10.2
* numpy
* os
* pandas
* tqdm

Check the Github repository for an environment.yml (for conda environments) or requirements.txt (pip) file.

**Files:**

* data/terrain/riv_pfaf_2_MERIT_Hydro_v07_Basins_v01.shp. Available at: https://www.reachhydro.org/home/params/merit-basins (Last access: 23 November 2023)
* data/shapefiles/estreams_catchments.shp
* data/gee/terrain/EStreams_elevation_attributes_gee.csv. Elevation attributes CSV-file(s) exported from GEE.
* data/gee/EStreams_slope_attributes_gee.csv. Slope attributes CSV-file(s) exported from GEE.

**Directory:**

* Clone the GitHub directory locally
* Place any third-data variables in their respective directory.
* ONLY update the "PATH" variable in the section "Configurations", with their relative path to the EStreams directory. 

## References

* Yamazaki, D. et al. A high-accuracy map of global terrain elevations. Geophys Res Lett 44, 5844–5853 (2017).
* Yamazaki, D. et al. MERIT Hydro: A High-Resolution Global Hydrography Map Based on Latest Topography Dataset. Water Resour Res 55, 5053–5073 (2019).

## License
* MERIT: Dual-license - CC-BY-NC 4.0 & ODbL 1.0. http://hydro.iis.u-tokyo.ac.jp/~yamadai/MERIT_DEM/index.html (Last access: 27 November 2023)

## Observations
* This notebook assumes that the GEE code to export elevation and slope descriptors from the MERIT-dem dataset (EStreams_landscape_attributes_terrain_gee.txt) was run before in the GEE platform and that the output CSV-files are locally available. 

# Import modules

In [None]:
import os
import numpy as np
import pandas as pd
import geopandas as gpd
import tqdm as tqdm
from utils.terrain import *

# Configurations

In [None]:
# Only editable variables:
# Relative path to your local directory
PATH = "../../.."

* #### The users should NOT change anything in the code below here.

In [None]:
# Non-editable variables:
PATH_OUTPUT = "results/staticattributes/"

# Set the directory:
os.chdir(PATH)

# Import data
## Catchment boundaries

In [None]:
catchment_boundaries = gpd.read_file('data/shapefiles/Catchment_Boundaries_HUGR_33new.shp')
catchment_boundaries.head()

In [None]:
print("The total number of catchments to be processed are:", len(catchment_boundaries))

## Merit-Hydro river network

In [None]:
river_net_EU_MERIT = gpd.read_file('data/terrain/riv_pfaf_2_MERIT_Hydro_v07_Basins_v01.shp')
river_net_EU_MERIT

## GEE outputs

In [None]:
# Elevation descriptors
terrain_atrributes_gee_elevation = pd.read_csv("data/gee/terrain/EStreams_elevation_attributes_gee.csv", index_col=1)
terrain_atrributes_gee_elevation.drop(["system:index", ".geo"], axis = 1, inplace = True)
terrain_atrributes_gee_elevation.columns = ["ele_mt_max", "ele_mt_mean", "ele_mt_min"]
terrain_atrributes_gee_elevation

In [None]:
# Slope descriptors
terrain_atrributes_gee_slope = pd.read_csv("data/gee/terrain/EStreams_slope_attributes_gee.csv", index_col=1)
terrain_atrributes_gee_slope.drop(["system:index", ".geo"], axis = 1, inplace = True)
terrain_atrributes_gee_slope.columns = ["flat_area_fra", "slp_dg_mean", "steep_area_fra"]
terrain_atrributes_gee_slope = terrain_atrributes_gee_slope[["slp_dg_mean", "flat_area_fra", "steep_area_fra"]]
terrain_atrributes_gee_slope

In [None]:
terrain_atrributes_df = pd.concat([terrain_atrributes_gee_elevation, terrain_atrributes_gee_slope], axis=1)
terrain_atrributes_df

# Reproject to projected coordinates system

In [None]:
# Here you can check the crs of the datasets:
print("CRS of catchment_boundaries:", catchment_boundaries.crs)
print("CRS of river_net_EU_MERIT:", river_net_EU_MERIT.crs)

In [None]:
# Define the target CRS to ETRS89 LAEA
target_crs = 'EPSG:3035'  # ETRS89 LAEA

# Reproject the GeoDataFrame to the target CRS
catchment_boundaries_reprojected = catchment_boundaries.to_crs(target_crs)
river_net_EU_MERIT_reprojected = river_net_EU_MERIT.to_crs(target_crs)

In [None]:
# Here you can check the new crs of the datasets:
print("CRS of catchment_boundaries:", catchment_boundaries_reprojected.crs)
print("CRS of river_net_EU_MERIT:", river_net_EU_MERIT_reprojected.crs)

# Compute area in sqm

In [None]:
catchment_boundaries_reprojected["area_sqm"] = catchment_boundaries_reprojected.area
catchment_boundaries_reprojected.head()

# Disssolve niver-network


In [None]:
river_net_EU_MERIT_dissolved = river_net_EU_MERIT_reprojected.dissolve()

# River network density

In [None]:
# Create a spatial index for the river network
sindex = river_net_EU_MERIT_reprojected.sindex

# Initialize a dictionary to store results
results = {}

# Iterate through each catchment
for catchment_id in tqdm.tqdm(catchment_boundaries_reprojected.basin_id):

    # Filter the selected catchment
    selected_boundary = catchment_boundaries_reprojected[catchment_boundaries_reprojected['id'] == catchment_id]

    # Calculate the total length of lines within the selected catchment
    total_length = 0
    boundary_bounds = selected_boundary.total_bounds
    possible_matches_index = list(sindex.intersection(boundary_bounds))
    possible_matches = river_net_EU_MERIT_reprojected.iloc[possible_matches_index]
    
    for index, row in possible_matches.iterrows():
        if row['geometry'].intersects(selected_boundary.unary_union):
            total_length += row['geometry'].intersection(selected_boundary.unary_union).length

    # Store the result in the dictionary
    results[catchment_id] = total_length

# Convert the dictionary to a DataFrame
strm_dens_df = pd.DataFrame(list(results.items()), columns=['basin_id', 'totalnet_length_m'])
strm_dens_df.set_index("basin_id", inplace = True)
strm_dens_df

In [None]:
# Convert the dictionary to a DataFrame
strm_dens_df = pd.DataFrame(list(results.items()), columns=['basin_id', 'totalnet_length_m'])
strm_dens_df.set_index("basin_id", inplace = True)
strm_dens_df

In [None]:
strm_dens_df["area"] = catchment_boundaries_reprojected.set_index("basin_id").area
strm_dens_df["strm_dens"] = strm_dens_df["totalnet_length_m"] / strm_dens_df["area"] 
strm_dens_df

# Enlongation ratio

In [None]:
# Create a dataframe to process the computation:
terrain_atrributes_enlon_ratio = pd.DataFrame()
terrain_atrributes_enlon_ratio["basin_id"] = catchment_boundaries_reprojected.basin_id
terrain_atrributes_enlon_ratio["area"] = catchment_boundaries_reprojected.area

# Assuming gdf is your GeoDataFrame with a Polygon geometry column named 'geometry'
terrain_atrributes_enlon_ratio['x_dimns'], terrain_atrributes_enlon_ratio['y_dimns'], terrain_atrributes_enlon_ratio['length']  = calculate_dimensions(catchment_boundaries_reprojected['geometry'])
terrain_atrributes_enlon_ratio

In [None]:
# Enlongation ratio computation:
terrain_atrributes_enlon_ratio['elon_ratio'] = terrain_atrributes_enlon_ratio.apply(calculate_elongation_ratio, axis=1)
terrain_atrributes_enlon_ratio.set_index("basin_id", inplace = True)
terrain_atrributes_enlon_ratio

# Final aggregation

In [None]:
# First we create an empty table data frame to assing the values to it
terrain_df = pd.DataFrame(index = catchment_boundaries_reprojected.basin_id)

# Now we proceed with the concatenation:
terrain_df = pd.concat([terrain_df, terrain_atrributes_df, terrain_atrributes_enlon_ratio.elon_ratio, 
                        strm_dens_df.strm_dens], axis=1)

terrain_df

In [None]:
# Here we sort the columns:
terrain_df = terrain_df.sort_index(axis=0)
terrain_df

In [None]:
# Set the strm_dens to 1000km/km2 (improve units storage)
terrain_df.strm_dens = terrain_df.strm_dens*1000

In [None]:
# Assign the "basin_id" to the gauges names:
terrain_df.index.name = "basin_id"

In [None]:
# Convert from frac to perc:
terrain_df.flat_area_fra = terrain_df.flat_area_fra * 100
terrain_df.steep_area_fra = terrain_df.steep_area_fra * 100
terrain_df

In [None]:
# Round the data to 3 decimals
terrain_df = terrain_df.astype(float).round(3)
terrain_df

# Data export

In [None]:
# Export the final dataset:
terrain_df.to_csv(PATH_OUTPUT+"estreams_terrain_attributes.csv")

# End