# Lithology attributes extraction


Author: Thiago Nascimento (thiago.nascimento@eawag.ch)

This notebook is part of the EStreams publication and was used to extract and aggregate the lithological attributes from the GLiM shapefile to the catchment boundaries.

* Note that this code enables not only the replicability of the current database but also the extrapolation to new catchment areas. 
* Additionally, the user should download and insert the original raw-data in the folder of the same name prior to run this code. 
* The original third-party data used were not made avaialable in this repository due to redistribution and storage-space reasons.  

## Requirements
**Python:**
* Python>=3.6
* Jupyter
* geopandas=0.10.2
* numpy
* os
* pandas
* rasterio
* time
* tqdm

Check the Github repository for an environment.yml (for conda environments) or requirements.txt (pip) file.

**Files:**
* data/lithology/GLiM.shp. Available at: http://dx.doi.org/10.1594/PANGAEA.788537 (Last access 23 November 2023)
* data/lithology/average_soil_and_sedimentary-deposit_thickness.tif. Available at: https://daac.ornl.gov/cgi-bin/dsviewer.pl?ds_id=1304 (Last access 23 November 2023)
* data/shapefiles/estreams_boundaries.shp

**Directory:**

* Clone the GitHub directory locally
* Place any third-data variables in their respective directory.
* ONLY update the "PATH" variable in the section "Configurations", with their relative path to the EStreams directory. 

## References

* Hartmann, J., and Moosdorf, N. (2012), The new global lithological map database GLiM: A representation of rock properties at the Earth surface, Geochem. Geophys. Geosyst., 13, Q12004, https://doi.org/10.1029/2012GC004370

* Pelletier, J.D., P.D. Broxton, P. Hazenberg, X. Zeng, P.A. Troch, G. Niu, Z.C. Williams, M.A. Brunke, and D. Gochis. 2016. Global 1-km Gridded Thickness of Soil, Regolith, and Sedimentary Deposit Layers. ORNL DAAC, Oak Ridge, Tennessee, USA. https://doi.org/10.3334/ORNLDAAC/1304


## Licenses

* GLiM: Creative Commons Attribution 3.0 Unported (CC-BY-3.0). https://daac.ornl.gov/cgi-bin/dsviewer.pl?ds_id=1304 (Last access 27 November 2023)
* Depth to bedrock: Open-access. https://daac.ornl.gov/cgi-bin/dsviewer.pl?ds_id=1304 (Last access 27 November 2023)


## Observations

#### GLiM lithological classes

1. nd: No Data
2. su: Unconsolidated Sediments
3. ss: Siliciclastic Sedimentary 
4. sm: Mixed Sedimentary Rocks
5. sc: Carbonate Sedimentary Rocks
6. py: Pyroclastics
7. ev: Evaporites
8. mt: Metamorphic Rocks
9. pa: Acid Plutonic Rocks
10. pi: Intermediate Plutonic Rocks
11. pb: Basic Plutonic Rocks
12. va: Acid Volcanic Rocks
13. vi: Intermediate Vulcanic Rocks
14. vb: Basic Volcanic Rocks
15. ig: Ice and Glaciers
16. wb: Water bodies

# Import modules

In [1]:
import geopandas as gpd
import pandas as pd
import tqdm as tqdm
import os
import numpy as np
import rasterio
import time
from rasterio.features import geometry_mask

# Configurations

In [2]:
# Only editable variables:
# Relative path to your local directory
PATH = "../../.."

* #### The users should NOT change anything in the code below here.


In [3]:
# Non-editable variables
PATH = "../../.."
PATH_OUTPUT = "results/staticattributes/"
# Set the directory:
os.chdir(PATH)

# Import data
## Catchment boundaries

In [4]:
catchment_boundaries = gpd.read_file('data/shapefiles/estreams_catchments.shp')
catchment_boundaries

Unnamed: 0,basin_id,gauge_id,gauge_coun,area,area_calc,area_flag,area_perc,start_date,end_date,geometry
0,AT000001,200014,AT,4647.9,4668.379,0,-0.440608,1996-01-01,2019-12-31,"POLYGON Z ((9.69406 46.54322 0.00000, 9.69570 ..."
1,AT000002,200048,AT,102.0,102.287,0,-0.281373,1958-10-01,2019-12-31,"POLYGON Z ((10.13650 47.02949 0.00000, 10.1349..."
2,AT000003,231662,AT,535.2,536.299,0,-0.205344,1985-01-02,2019-12-31,"POLYGON Z ((10.11095 46.89437 0.00000, 10.1122..."
3,AT000004,200592,AT,66.6,66.286,0,0.471471,1998-01-02,2019-12-31,"POLYGON Z ((10.14189 47.09706 0.00000, 10.1404..."
4,AT000005,200097,AT,72.2,72.448,0,-0.343490,1990-01-01,2019-12-31,"POLYGON Z ((9.67851 47.06249 0.00000, 9.67888 ..."
...,...,...,...,...,...,...,...,...,...,...
15042,UAGR0017,6682300,UA,321.0,325.370,0,-1.361371,1978-01-01,1987-12-31,"POLYGON Z ((33.96791 44.63291 0.00000, 33.9679..."
15043,UAGR0018,6682500,UA,49.7,47.594,0,4.237425,1978-01-01,1987-12-31,"POLYGON Z ((34.19958 44.58291 0.00000, 34.2029..."
15044,UAGR0019,6683010,UA,261.0,244.731,1,6.233333,1978-01-01,1987-12-31,"POLYGON Z ((34.19624 44.88375 0.00000, 34.1962..."
15045,UAGR0020,6683200,UA,760.0,731.073,0,3.806184,1978-01-01,1987-12-31,"POLYGON Z ((35.78708 47.28708 0.00000, 35.7870..."


In [5]:
print("The total number of catchments to be processed are:", len(catchment_boundaries))

The total number of catchments to be processed are: 15047


## Depth to bedrock raster

In [6]:
# Specify the path to the raster file you want to open
raster_depthtobedrock = "data/lithology/average_soil_and_sedimentary-deposit_thickness.tif"

## GLiM shapefile

GLiM original layer. Note that you can read an already clipped version and speed-up the processing.


In [7]:
GLiM = gpd.read_file('data/lithology/GLiM.shp')
GLiM

Unnamed: 0,OBJECTID,IDENTITY_,Litho,xx,Shape_Leng,Shape_Area,geometry
0,376748.0,AFR450,scpu__,sc,7.800228e+04,1.827193e+08,"POLYGON ((-464622.193 4521108.834, -468132.358..."
1,376749.0,AFR453,scpu__,sc,2.893601e+04,4.228265e+07,"POLYGON ((-473117.863 4537597.462, -473479.644..."
2,376750.0,AFR455,scpu__,sc,1.142850e+04,5.778357e+06,"POLYGON ((-472163.272 4545164.006, -474340.951..."
3,376751.0,AFR459,ssmx__,ss,6.327710e+04,1.145809e+08,"POLYGON ((-472163.272 4545164.006, -472602.054..."
4,376752.0,AFR469,scpu__,sc,1.491184e+04,1.588138e+07,"POLYGON ((-479841.649 4541182.667, -480124.048..."
...,...,...,...,...,...,...,...
181638,1235255.0,GBR11805,scpu__,sc,2.710534e+06,1.426631e+10,"POLYGON ((54463.500 6363726.441, 54920.596 636..."
181639,1235256.0,FRA3164,scpymt,sc,1.105171e+04,4.455177e+06,"POLYGON ((1269.453 6002985.657, 4201.495 60012..."
181640,1235257.0,FRA4299,ssmxmt,ss,6.539316e+03,1.874973e+06,"POLYGON ((2006.125 5929658.691, 1600.227 59296..."
181641,1235258.0,FRA6341,scpymt,sc,5.792466e+04,3.288585e+07,"POLYGON ((5615.290 5879472.201, 7525.896 58786..."


To optimize the process it is important to dissolve the polygon geometries before intersecting the areas. Here we dissove it by the atribute field corresponding to the unique-id for each lithological class. 

In [8]:
attribute_field = 'xx'
GLiM_dissolved = GLiM.dissolve(by=attribute_field)

# Now we create a new feature with the lithology class:
GLiM_dissolved["class"] = GLiM_dissolved.index
GLiM_dissolved

Unnamed: 0_level_0,geometry,OBJECTID,IDENTITY_,Litho,Shape_Leng,Shape_Area,class
xx,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
ev,"MULTIPOLYGON (((796612.255 4303804.029, 794661...",934081.0,AFR523,ev____,50473.09,65132900.0,ev
ig,"MULTIPOLYGON (((575196.412 5659697.793, 575336...",921749.0,GRL535,ig____,64672700.0,1697807000000.0,ig
mt,"MULTIPOLYGON (((-764553.738 4693621.364, -7645...",376782.0,AFR591,mt____,138785.6,520521300.0,mt
nd,"MULTIPOLYGON (((700251.625 6039527.249, 700077...",1110327.0,TUR4883,nd____,1914.59,227226.5,nd
pa,"MULTIPOLYGON (((-809472.759 4869757.421, -8094...",376864.0,AFR771,pa____,699787.1,2273725000.0,pa
pb,"MULTIPOLYGON (((-647661.709 4211164.200, -6493...",376807.0,AFR669,pb__vr,166687.7,718559100.0,pb
pi,"MULTIPOLYGON (((-731155.004 4704819.954, -7317...",392850.0,FRA4091,pi__mt,49414.81,35766860.0,pi
py,"MULTIPOLYGON (((-547556.865 4767878.253, -5472...",409551.0,ESP12223,py____,11462.99,8731417.0,py
sc,"MULTIPOLYGON (((-760132.638 4215995.713, -7604...",376748.0,AFR450,scpu__,78002.28,182719300.0,sc
sm,"MULTIPOLYGON (((-713312.656 4211530.170, -7144...",376753.0,AFR475,smmx__,52941.67,161098400.0,sm


## Reproject to projected coordinates system

In [9]:
# Here you can check the crs of the datasets:
print("CRS of catchment_boundaries:", catchment_boundaries.crs)
print("CRS of GLiM:", GLiM_dissolved.crs)

CRS of catchment_boundaries: epsg:4326
CRS of GLiM: PROJCS["World_Eckert_IV",GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0],UNIT["Degree",0.0174532925199433]],PROJECTION["Eckert_IV"],PARAMETER["central_meridian",0],PARAMETER["false_easting",0],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH],AUTHORITY["ESRI","54012"]]


In [10]:
# Define the target CRS to ETRS89 LAEA (3035)
target_crs = 'EPSG:3035'  

# Reproject the GeoDataFrame to the target CRS
catchment_boundaries_reprojected = catchment_boundaries.to_crs(target_crs)
GLiM_reprojected = GLiM_dissolved.to_crs(target_crs)

In [11]:
# Here you can check the crs of the datasets:
print("CRS of catchment_boundaries:", catchment_boundaries_reprojected.crs)
print("CRS of GLiM:", GLiM_reprojected.crs)

CRS of catchment_boundaries: EPSG:3035
CRS of GLiM: EPSG:3035


# Bedrock depth extraction

In [12]:
# For the bedrock we use the crs in 4326:
subset_catchment=catchment_boundaries.copy()

In [13]:
# Create lists to store the results
avg_values = []
with rasterio.open(raster_depthtobedrock) as src:
    for idx, geom in tqdm.tqdm(subset_catchment.iterrows()):
        # Create a mask for the geometry
        mask = geometry_mask([geom['geometry']], out_shape=src.shape, transform=src.transform, invert=True)

        # Read the values within the geometry from the raster
        values = src.read(1, masked=True)
        values = values[mask]

        # Calculate statistics only if there are valid values in the 'values' array
        if len(values) > 0:
            avg_value = np.mean(values)

        else:
            # Handle the case when there are no valid values (e.g., by setting them to NaN or a specific value)
            avg_value = np.nan

        # Store the results in the lists
        avg_values.append(avg_value)

# Create a DataFrame to store the results for this file
data = {
    'basin_id': subset_catchment['basin_id'],
    'bedrk_dep': avg_values,
}
bedrk_dep_df = pd.DataFrame(data)
bedrk_dep_df.set_index("basin_id", inplace=True)


15047it [10:24:54,  2.49s/it]


In [14]:
bedrk_dep_df

Unnamed: 0_level_0,bedrk_dep
basin_id,Unnamed: 1_level_1
AT000001,1.119727
AT000002,0.564972
AT000003,0.583790
AT000004,0.564103
AT000005,0.421488
...,...
UAGR0017,0.920755
UAGR0018,0.807692
UAGR0019,0.942065
UAGR0020,4.069021


# Intersection areas

In [15]:
subset_catchment=catchment_boundaries_reprojected.copy()
subset_catchment

Unnamed: 0,basin_id,gauge_id,gauge_coun,area,area_calc,area_flag,area_perc,start_date,end_date,geometry
0,AT000001,200014,AT,4647.9,4668.379,0,-0.440608,1996-01-01,2019-12-31,"POLYGON Z ((4297509.670 2603371.652 0.000, 429..."
1,AT000002,200048,AT,102.0,102.287,0,-0.281373,1958-10-01,2019-12-31,"POLYGON Z ((4331385.008 2657338.699 0.000, 433..."
2,AT000003,231662,AT,535.2,536.299,0,-0.205344,1985-01-02,2019-12-31,"POLYGON Z ((4329462.635 2642327.328 0.000, 432..."
3,AT000004,200592,AT,66.6,66.286,0,0.471471,1998-01-02,2019-12-31,"POLYGON Z ((4331781.145 2664845.037 0.000, 433..."
4,AT000005,200097,AT,72.2,72.448,0,-0.343490,1990-01-01,2019-12-31,"POLYGON Z ((4296557.010 2661047.164 0.000, 429..."
...,...,...,...,...,...,...,...,...,...,...
15042,UAGR0017,6682300,UA,321.0,325.370,0,-1.361371,1978-01-01,1987-12-31,"POLYGON Z ((6189293.912 2695103.052 0.000, 618..."
15043,UAGR0018,6682500,UA,49.7,47.594,0,4.237425,1978-01-01,1987-12-31,"POLYGON Z ((6208308.751 2695649.654 0.000, 620..."
15044,UAGR0019,6683010,UA,261.0,244.731,1,6.233333,1978-01-01,1987-12-31,"POLYGON Z ((6197896.604 2727593.845 0.000, 619..."
15045,UAGR0020,6683200,UA,760.0,731.073,0,3.806184,1978-01-01,1987-12-31,"POLYGON Z ((6228035.222 3023493.220 0.000, 622..."


In [16]:
# Record the start time
start_time = time.time()

lithology_overlap = gpd.overlay(df1=subset_catchment, df2=GLiM_reprojected, how='intersection')

# Record the end time
end_time = time.time()

# Print the elapsed time in seconds when done:
print("Elapsed time: {:.1f} seconds".format(end_time - start_time))

Elapsed time: 12109.5 seconds


In [17]:
# Calculate the areas of the overlapping polygons and add them as a new column
lithology_overlap['area_sqm'] = lithology_overlap['geometry'].area/1000000
lithology_overlap

Unnamed: 0,basin_id,gauge_id,gauge_coun,area,area_calc,area_flag,area_perc,start_date,end_date,OBJECTID,IDENTITY_,Litho,Shape_Leng,Shape_Area,class,geometry,area_sqm
0,AT000001,200014,AT,4647.9,4668.379,0,-0.440608,1996-01-01,2019-12-31,921749.0,GRL535,ig____,6.467270e+07,1.697807e+12,ig,MULTIPOLYGON Z (((4325971.339 2638701.453 0.00...,25.809165
1,AT000003,231662,AT,535.2,536.299,0,-0.205344,1985-01-02,2019-12-31,921749.0,GRL535,ig____,6.467270e+07,1.697807e+12,ig,MULTIPOLYGON Z (((4328724.218 2637115.380 0.00...,0.599216
2,AT000007,231688,AT,1118.6,1143.768,0,-2.249955,1985-01-01,2019-12-31,921749.0,GRL535,ig____,6.467270e+07,1.697807e+12,ig,MULTIPOLYGON Z (((4328724.218 2637115.380 0.00...,0.599216
3,AT000009,200147,AT,1281.0,1281.796,0,-0.062139,1951-01-01,2019-12-31,921749.0,GRL535,ig____,6.467270e+07,1.697807e+12,ig,MULTIPOLYGON Z (((4328724.218 2637115.380 0.00...,0.599216
4,AT000013,200196,AT,6301.1,6305.855,0,-0.075463,1951-01-01,2019-12-31,921749.0,GRL535,ig____,6.467270e+07,1.697807e+12,ig,MULTIPOLYGON Z (((4328724.218 2637115.380 0.00...,26.408381
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
54814,TRGR0012,6688500,TR,2918.0,6128.701,999,-110.030877,1977-10-01,1987-09-30,934081.0,AFR523,ev____,5.047309e+04,6.513290e+07,ev,MULTIPOLYGON Z (((6479122.393 2158968.622 0.00...,188.019976
54815,TRGR0013,6688600,TR,75121.0,77073.993,1,-2.599796,1975-10-01,1986-09-30,934081.0,AFR523,ev____,5.047309e+04,6.513290e+07,ev,MULTIPOLYGON Z (((6697278.071 2306131.774 0.00...,13045.033847
54816,TRGR0014,6688650,TR,35958.0,36076.460,0,-0.329440,1975-10-01,1986-09-30,934081.0,AFR523,ev____,5.047309e+04,6.513290e+07,ev,MULTIPOLYGON Z (((6721566.426 2337579.367 0.00...,1693.619139
54817,TRGR0024,6695200,TR,63835.0,63332.606,0,0.787020,1975-10-01,1980-09-30,934081.0,AFR523,ev____,5.047309e+04,6.513290e+07,ev,MULTIPOLYGON Z (((6636301.174 2240850.412 0.00...,2706.492389


# Pivot table

In [18]:
# Finally we can creatre a pivot-table with the percentage of each lithological class per catchment:

lithology_areas = pd.pivot_table(
    lithology_overlap,
    values='area_sqm',     # Replace with the actual column name for the area
    index='basin_id',      # Rows are based on 'basin_id'
    columns='class',       # Columns are based on 'class' (the class)
    aggfunc='sum',         # Sum the areas for each combination
    fill_value=0           # Replace NaN with 0
)

# Here we can sum to compute the total area of each catchment: 
lithology_areas.loc[:, "totalarea"] = lithology_areas.sum(axis = 1)
lithology_areas

class,ev,ig,mt,nd,pa,pb,pi,py,sc,sm,ss,su,va,vb,vi,wb,totalarea
basin_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
AT000001,0.0,25.809165,1095.402759,0.0,124.040398,1.974315,8.873785,0.0,1809.131014,257.755663,181.510178,1030.039178,36.224212,86.728318,0.0,10.889636,4668.378620
AT000002,0.0,0.000000,90.573811,0.0,0.000000,0.000000,0.000000,0.0,4.276571,7.436500,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,102.286881
AT000003,0.0,0.599216,453.992107,0.0,0.000000,0.000000,0.000000,0.0,70.759037,10.557070,0.000000,0.373020,0.000000,0.018081,0.0,0.000000,536.298531
AT000004,0.0,0.000000,39.731817,0.0,0.000000,0.000000,0.000000,0.0,26.553875,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,66.285692
AT000005,0.0,0.000000,11.288462,0.0,0.000000,0.000000,0.000000,0.0,59.454580,0.641132,0.000000,0.000000,0.000000,0.000000,0.0,1.063555,72.447730
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
UAGR0017,0.0,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.0,139.011977,0.000000,186.357541,0.000000,0.000000,0.000000,0.0,0.000000,325.369518
UAGR0018,0.0,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.0,37.818684,3.018128,6.757615,0.000000,0.000000,0.000000,0.0,0.000000,47.594427
UAGR0019,0.0,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.0,210.041087,0.000000,34.689920,0.000000,0.000000,0.000000,0.0,0.000000,244.731007
UAGR0020,0.0,0.000000,163.509650,0.0,317.825665,0.000000,0.000000,0.0,0.000000,249.737418,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,731.072732


## Catchment covered by shapefile
* Here we compute the total catchment area covered by the lithology shapefile:


In [19]:
catchment_boundaries_reprojected.set_index('basin_id', inplace = True)

In [20]:
lithology_areas['area_calc'] = catchment_boundaries_reprojected.area / 1000000
lithology_areas

class,ev,ig,mt,nd,pa,pb,pi,py,sc,sm,ss,su,va,vb,vi,wb,totalarea,area_calc
basin_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1
AT000001,0.0,25.809165,1095.402759,0.0,124.040398,1.974315,8.873785,0.0,1809.131014,257.755663,181.510178,1030.039178,36.224212,86.728318,0.0,10.889636,4668.378620,4668.378620
AT000002,0.0,0.000000,90.573811,0.0,0.000000,0.000000,0.000000,0.0,4.276571,7.436500,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,102.286881,102.286881
AT000003,0.0,0.599216,453.992107,0.0,0.000000,0.000000,0.000000,0.0,70.759037,10.557070,0.000000,0.373020,0.000000,0.018081,0.0,0.000000,536.298531,536.298531
AT000004,0.0,0.000000,39.731817,0.0,0.000000,0.000000,0.000000,0.0,26.553875,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,66.285692,66.285692
AT000005,0.0,0.000000,11.288462,0.0,0.000000,0.000000,0.000000,0.0,59.454580,0.641132,0.000000,0.000000,0.000000,0.000000,0.0,1.063555,72.447730,72.447730
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
UAGR0017,0.0,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.0,139.011977,0.000000,186.357541,0.000000,0.000000,0.000000,0.0,0.000000,325.369518,325.369518
UAGR0018,0.0,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.0,37.818684,3.018128,6.757615,0.000000,0.000000,0.000000,0.0,0.000000,47.594427,47.594427
UAGR0019,0.0,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.0,210.041087,0.000000,34.689920,0.000000,0.000000,0.000000,0.0,0.000000,244.731007,244.731007
UAGR0020,0.0,0.000000,163.509650,0.0,317.825665,0.000000,0.000000,0.0,0.000000,249.737418,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,731.072732,731.072732


In [21]:
lithology_areas['tot_area'] = lithology_areas.totalarea / lithology_areas.area_calc
lithology_areas

class,ev,ig,mt,nd,pa,pb,pi,py,sc,sm,ss,su,va,vb,vi,wb,totalarea,area_calc,tot_area
basin_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1
AT000001,0.0,25.809165,1095.402759,0.0,124.040398,1.974315,8.873785,0.0,1809.131014,257.755663,181.510178,1030.039178,36.224212,86.728318,0.0,10.889636,4668.378620,4668.378620,1.0
AT000002,0.0,0.000000,90.573811,0.0,0.000000,0.000000,0.000000,0.0,4.276571,7.436500,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,102.286881,102.286881,1.0
AT000003,0.0,0.599216,453.992107,0.0,0.000000,0.000000,0.000000,0.0,70.759037,10.557070,0.000000,0.373020,0.000000,0.018081,0.0,0.000000,536.298531,536.298531,1.0
AT000004,0.0,0.000000,39.731817,0.0,0.000000,0.000000,0.000000,0.0,26.553875,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,66.285692,66.285692,1.0
AT000005,0.0,0.000000,11.288462,0.0,0.000000,0.000000,0.000000,0.0,59.454580,0.641132,0.000000,0.000000,0.000000,0.000000,0.0,1.063555,72.447730,72.447730,1.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
UAGR0017,0.0,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.0,139.011977,0.000000,186.357541,0.000000,0.000000,0.000000,0.0,0.000000,325.369518,325.369518,1.0
UAGR0018,0.0,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.0,37.818684,3.018128,6.757615,0.000000,0.000000,0.000000,0.0,0.000000,47.594427,47.594427,1.0
UAGR0019,0.0,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.0,210.041087,0.000000,34.689920,0.000000,0.000000,0.000000,0.0,0.000000,244.731007,244.731007,1.0
UAGR0020,0.0,0.000000,163.509650,0.0,317.825665,0.000000,0.000000,0.0,0.000000,249.737418,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,731.072732,731.072732,1.0


# Data organization

In [30]:
# Here we compute the lithology percentages from each class:
lithology_df = (lithology_areas.iloc[:, :16].div(lithology_areas['totalarea'], axis=0))*100
lithology_df = lithology_df.iloc[:, 0:-1]
lithology_df

class,ev,ig,mt,nd,pa,pb,pi,py,sc,sm,ss,su,va,vb,vi
basin_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
AT000001,0.0,0.552851,23.464308,0.0,2.657034,0.042291,0.190083,0.0,38.752877,5.521310,3.888077,22.064174,0.775948,1.857782,0.0
AT000002,0.0,0.000000,88.548805,0.0,0.000000,0.000000,0.000000,0.0,4.180957,7.270238,0.000000,0.000000,0.000000,0.000000,0.0
AT000003,0.0,0.111732,84.652872,0.0,0.000000,0.000000,0.000000,0.0,13.193964,1.968506,0.000000,0.069555,0.000000,0.003371,0.0
AT000004,0.0,0.000000,59.940261,0.0,0.000000,0.000000,0.000000,0.0,40.059739,0.000000,0.000000,0.000000,0.000000,0.000000,0.0
AT000005,0.0,0.000000,15.581526,0.0,0.000000,0.000000,0.000000,0.0,82.065484,0.884958,0.000000,0.000000,0.000000,0.000000,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
UAGR0017,0.0,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.0,42.724339,0.000000,57.275661,0.000000,0.000000,0.000000,0.0
UAGR0018,0.0,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.0,79.460320,6.341348,14.198332,0.000000,0.000000,0.000000,0.0
UAGR0019,0.0,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.0,85.825286,0.000000,14.174714,0.000000,0.000000,0.000000,0.0
UAGR0020,0.0,0.000000,22.365716,0.0,43.473878,0.000000,0.000000,0.0,0.000000,34.160407,0.000000,0.000000,0.000000,0.000000,0.0


In [31]:
# Create a new column with the name of the column with the majority class
lithology_df['lit_dom'] = lithology_df.apply(lambda row: row.idxmax(), axis=1)

# Add "th_new_" as a prefix to all column names
lithology_df = lithology_df.add_prefix('lit_fra_')
lithology_df = lithology_df.rename(columns={'lit_fra_lit_dom': 'lit_dom'})

# Catchment ara covered by the lithology shapef
lithology_df['tot_area'] = (lithology_areas.tot_area)*100

# Concatenate the bedrock depth:
lithology_df["bedrk_dep"] = bedrk_dep_df

In [32]:
# Here we sort the index:
lithology_df = lithology_df.sort_index(axis=0)
lithology_df

class,lit_fra_ev,lit_fra_ig,lit_fra_mt,lit_fra_nd,lit_fra_pa,lit_fra_pb,lit_fra_pi,lit_fra_py,lit_fra_sc,lit_fra_sm,lit_fra_ss,lit_fra_su,lit_fra_va,lit_fra_vb,lit_fra_vi,lit_dom,tot_area,bedrk_dep
basin_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1
AT000001,0.0,0.552851,23.464308,0.0,2.657034,0.042291,0.190083,0.0,38.752877,5.521310,3.888077,22.064174,0.775948,1.857782,0.0,sc,100.0,1.119727
AT000002,0.0,0.000000,88.548805,0.0,0.000000,0.000000,0.000000,0.0,4.180957,7.270238,0.000000,0.000000,0.000000,0.000000,0.0,mt,100.0,0.564972
AT000003,0.0,0.111732,84.652872,0.0,0.000000,0.000000,0.000000,0.0,13.193964,1.968506,0.000000,0.069555,0.000000,0.003371,0.0,mt,100.0,0.583790
AT000004,0.0,0.000000,59.940261,0.0,0.000000,0.000000,0.000000,0.0,40.059739,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,mt,100.0,0.564103
AT000005,0.0,0.000000,15.581526,0.0,0.000000,0.000000,0.000000,0.0,82.065484,0.884958,0.000000,0.000000,0.000000,0.000000,0.0,sc,100.0,0.421488
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
UAGR0017,0.0,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.0,42.724339,0.000000,57.275661,0.000000,0.000000,0.000000,0.0,ss,100.0,0.920755
UAGR0018,0.0,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.0,79.460320,6.341348,14.198332,0.000000,0.000000,0.000000,0.0,sc,100.0,0.807692
UAGR0019,0.0,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.0,85.825286,0.000000,14.174714,0.000000,0.000000,0.000000,0.0,sc,100.0,0.942065
UAGR0020,0.0,0.000000,22.365716,0.0,43.473878,0.000000,0.000000,0.0,0.000000,34.160407,0.000000,0.000000,0.000000,0.000000,0.0,pa,100.0,4.069021


In [33]:
# Round the data to 3 decimals:
lithology_df.iloc[:, 0:-3] = lithology_df.iloc[:, 0:-3].round(3)
lithology_df.iloc[:, -2:] = lithology_df.iloc[:, -2:].round(3)

lithology_df

class,lit_fra_ev,lit_fra_ig,lit_fra_mt,lit_fra_nd,lit_fra_pa,lit_fra_pb,lit_fra_pi,lit_fra_py,lit_fra_sc,lit_fra_sm,lit_fra_ss,lit_fra_su,lit_fra_va,lit_fra_vb,lit_fra_vi,lit_dom,tot_area,bedrk_dep
basin_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1
AT000001,0.0,0.553,23.464,0.0,2.657,0.042,0.19,0.0,38.753,5.521,3.888,22.064,0.776,1.858,0.0,sc,100.0,1.120
AT000002,0.0,0.000,88.549,0.0,0.000,0.000,0.00,0.0,4.181,7.270,0.000,0.000,0.000,0.000,0.0,mt,100.0,0.565
AT000003,0.0,0.112,84.653,0.0,0.000,0.000,0.00,0.0,13.194,1.969,0.000,0.070,0.000,0.003,0.0,mt,100.0,0.584
AT000004,0.0,0.000,59.940,0.0,0.000,0.000,0.00,0.0,40.060,0.000,0.000,0.000,0.000,0.000,0.0,mt,100.0,0.564
AT000005,0.0,0.000,15.582,0.0,0.000,0.000,0.00,0.0,82.065,0.885,0.000,0.000,0.000,0.000,0.0,sc,100.0,0.421
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
UAGR0017,0.0,0.000,0.000,0.0,0.000,0.000,0.00,0.0,42.724,0.000,57.276,0.000,0.000,0.000,0.0,ss,100.0,0.921
UAGR0018,0.0,0.000,0.000,0.0,0.000,0.000,0.00,0.0,79.460,6.341,14.198,0.000,0.000,0.000,0.0,sc,100.0,0.808
UAGR0019,0.0,0.000,0.000,0.0,0.000,0.000,0.00,0.0,85.825,0.000,14.175,0.000,0.000,0.000,0.0,sc,100.0,0.942
UAGR0020,0.0,0.000,22.366,0.0,43.474,0.000,0.00,0.0,0.000,34.160,0.000,0.000,0.000,0.000,0.0,pa,100.0,4.069


# Data export

In [34]:
# Export the final dataset:
lithology_df.to_csv(PATH_OUTPUT+"estreams_geology_attributes.csv")

# End