# Geological attributes extraction


Author: Thiago Nascimento (thiago.nascimento@eawag.ch)

This notebook is part of an EStreams updated and was used to extract and aggregate the geological attributes from the International Hydrogeological Map of Europe (IHME), version 11 and scale: 1:1,500,000​ shapefile to the catchment boundaries.

* Note that this code enables not only the replicability of the current database but also the extrapolation to new catchment areas. 
* Additionally, the user should download and insert the original raw-data in the folder of the same name prior to run this code. 
* The original third-party data used were not made avaialable in this repository due to redistribution and storage-space reasons.  

## Requirements
**Python:**
* Python>=3.6
* Jupyter
* geopandas=0.10.2
* numpy
* os
* pandas
* rasterio
* time
* tqdm

Check the Github repository for an environment.yml (for conda environments) or requirements.txt (pip) file.

**Files:**
* data/geology/ihme1500_litho12345_ec4060_v12_poly.shp. Available at: www.bgr.bund.de (Last access 23 November 2024)
* data/shapefiles/estreams_boundaries.shp

**Directory:**

* Clone the GitHub directory locally
* Place any third-data variables in their respective directory.
* ONLY update the "PATH" variable in the section "Configurations", with their relative path to the EStreams directory. 

## References

* ​Günther, A., Duscher, K., 2019. Extended vector data of the International Hydrogeological Map of Europe 1:1,500,000 (Version IHME1500 v1.2). 

* ​Duscher, K., Günther, A., Richts, A., Clos, P., Philipp, U., Struckmeier, W., 2019. The GIS layers of the BInternational Hydrogeological Map of Europe 1:1,500,000^ in a vector format. https://doi.org/10.1007/s10040-015-1296-4 


## Licenses

* IHME: Usage of the IHME dataset is free for any purpose, you may consider the AGB notes that are with the download data. (Last access 14 April 2024)

## Observations

#### IHME lithological classes

1. Plutonic rocks
2. Volcanic rocks 
3. Inland water
4. Snow field / ice field
5. Clays
6. Quartzites
7. Shales
8. Claystone & clays 
9. Marbles
10. Marls
11. Marlstones
12. Marlstones & clays
13. Marlstones & marls
14. Phyllites
15. Schists
16. Gneisses
17. Silts
18. Conglomerates & clays
19. Limestones
20. Limestones & sands
21. Sandstones & clays
22. Sandstones & marls
23. Limestones & clays
24. Limestones & marls
25. Marlstones & sands 
26. Conglomerates
27. Conglomerates & sands
28. Gravels
29. Sands
30. Sandstones
31. Sandstones & Sands 

# Import modules

In [None]:
import geopandas as gpd
import pandas as pd
import tqdm as tqdm
import os
import numpy as np
import rasterio
import time
from rasterio.features import geometry_mask

# Configurations

In [None]:
# Only editable variables:
# Relative path to your local directory
PATH = "../../.."

* #### The users should NOT change anything in the code below here.


In [None]:
# Non-editable variables
PATH = "../../.."
PATH_OUTPUT = "results/staticattributes/"
# Set the directory:
os.chdir(PATH)

# Import data
## Catchment boundaries

In [None]:
catchment_boundaries = gpd.read_file('data/shapefiles/estreams_catchments.shp')
catchment_boundaries.set_index("basin_id", inplace = True)
catchment_boundaries.head()

In [None]:
print("The total number of catchments to be processed are:", len(catchment_boundaries))

## IHME shapefile
IHME original layer.

In [None]:
IHME = gpd.read_file('data/geology/ihme1500_litho12345_ec4060_v12_poly.shp')
IHME

To optimize the process it is important to dissolve the polygon geometries before intersecting the areas. Here we dissove it by the atribute field corresponding to the unique-id for each lithological class. 

In [None]:
# Check for and fix invalid geometries
if not IHME.geometry.is_valid.all():
    IHME.geometry = IHME.geometry.buffer(0)

In [None]:
attribute_field = 'LEVEL3'
IHME_dissolved = IHME.dissolve(by=attribute_field)

## Now we create a new feature with the lithology class:
IHME_dissolved["class"] = IHME_dissolved.index
IHME_dissolved

## Reproject to projected coordinates system

In [None]:
# Here you can check the crs of the datasets:
print("CRS of catchment_boundaries:", catchment_boundaries.crs)
print("CRS of IHME:", IHME_dissolved.crs)

In [None]:
# Define the target CRS to ETRS89 LAEA (3035)
target_crs = 'EPSG:3035'  

# Reproject the GeoDataFrame to the target CRS
catchment_boundaries_reprojected = catchment_boundaries.to_crs(target_crs)
IHME_reprojected = IHME_dissolved.to_crs(target_crs)

In [None]:
# Here you can check the crs of the datasets:
print("CRS of catchment_boundaries:", catchment_boundaries_reprojected.crs)
print("CRS of IHME:", IHME_reprojected.crs)

In [None]:
# Make sure to have the basin_id as one of the attributes
catchment_boundaries_reprojected["basin_id"] = catchment_boundaries_reprojected.index

# Intersection areas

In [None]:
subset_catchment=catchment_boundaries_reprojected.copy()
subset_catchment

In [None]:
# Record the start time
start_time = time.time()

geology_overlap = gpd.overlay(df1=subset_catchment, df2=IHME_reprojected, how='intersection')

# Record the end time
end_time = time.time()

# Print the elapsed time in seconds when done:
print("Elapsed time: {:.1f} seconds".format(end_time - start_time))

In [None]:
# Calculate the areas of the overlapping polygons and add them as a new column
geology_overlap['area_sqm'] = geology_overlap['geometry'].area/1000000
geology_overlap

# Pivot table

In [None]:
# Finally we can creatre a pivot-table with the percentage of each lithological class per catchment:

geology_areas = pd.pivot_table(
    geology_overlap,
    values='area_sqm',     # Replace with the actual column name for the area
    index='basin_id',      # Rows are based on 'basin_id'
    columns='class',       # Columns are based on 'class' (the class)
    aggfunc='sum',         # Sum the areas for each combination
    fill_value=0           # Replace NaN with 0
)

# Here we can sum to compute the total area of each catchment: 
geology_areas.loc[:, "totalarea"] = geology_areas.sum(axis = 1)
geology_areas

## Catchment covered by shapefile
* Here we compute the total catchment area covered by the geology shapefile:


In [None]:
catchment_boundaries_reprojected.set_index('basin_id', inplace = True)

In [None]:
geology_areas['area_calc'] = catchment_boundaries_reprojected.area / 1000000
geology_areas

In [None]:
geology_areas['tot_area'] = geology_areas.totalarea / geology_areas.area_calc
geology_areas

# Data organization

In [None]:
# Here we compute the geology percentages from each class:
geology_df = (geology_areas.iloc[:, :].div(geology_areas['totalarea'], axis=0))*100
geology_df = geology_df.iloc[:, 0:-3]
geology_df

In [None]:
# Create a new column with the name of the column with the majority class
geology_df['lit_dom'] = geology_df.iloc[:, 0:-1].apply(lambda row: row.idxmax(), axis=1)

# Add "th_new_" as a prefix to all column names
geology_df = geology_df.add_prefix('lit_fra_')
geology_df = geology_df.rename(columns={'lit_fra_lit_dom': 'lit_dom'})

# Catchment ara covered by the geology shapef
geology_df['tot_area'] = (geology_areas.tot_area)*100

In [None]:
# Here we sort the index:
geology_df = geology_df.sort_index(axis=0)
geology_df

In [None]:
# Round the data to 3 decimals:
geology_df.iloc[:, 0:-3] = geology_df.iloc[:, 0:-3].round(3)
geology_df.iloc[:, -2:] = geology_df.iloc[:, -2:].round(3)
geology_df

# Data export

In [None]:
# Export the final dataset:
geology_df.to_csv(PATH_OUTPUT+"\estreams_geology_continental_attributes.csv")

# End