# Landcover attributes extraction

Author: Thiago Nascimento (thiago.nascimento@eawag.ch)

This notebook is part of the EStreams publication and was used to extract and aggregate the lancover characteristics from the Corine dataset.

* Note that this code enables not only the replicability of the current database but also the extrapolation to new catchment areas. 
* Additionally, the user should download and insert the original raw-data in the folder of the same name prior to run this code. 
* The original third-party data used were not made avaialable in this repository due to redistribution and storage-space reasons.


## Requirements
**Python:**
* Python>=3.6
* Jupyter
* geopandas=0.10.2
* glob
* numpy
* os
* pandas
* tqdm

Check the Github repository for an environment.yml (for conda environments) or requirements.txt (pip) file.

**Files:**
* data/gee/landcover/EStreams_lulc{1990, 2000, 2006, 2012, 2018}_attributes_gee_{}.csv. Landcover attributes CSV-files exported from GEE
* data/shapefiles/estreams_catchments.shp

**Directory:**
* Clone the GitHub directory locally
* Place any third-data variables in their respective directory.
* ONLY update the "PATH" variable in the section "Configurations", with their relative path to the EStreams directory. 

## References

* CORINE Land Cover — Copernicus Land Monitoring Service. European Environment Agency [data set], Copenhagen, Denmark https://land.copernicus.eu/en/products/corine-land-cover.

## Licenses
* Corine: Open access. "The Copernicus land monitoring products and services are made available on a principle of full, open and free access, as established by the Commission Delegated Regulation (EU) No 1159/2013 of 12 July 2013." https://land.copernicus.eu/en/data-policy (Last access 27 November 2023)

## Observations
* This notebook assumes that the GEE code to export the landcover descriptors from the Corine dataset (EStreams_landscape_attributes_landcover_gee.txt) was run before in the GEE platform and that all the output CSV-files are locally available. 
* It is possible that there are more than one CSV-file per year if the user decided to subset the catchments in smaller groups for optimze the exportation. 
* All the lulc csv-files must be placed in a single folder together. 

# Import modules

In [1]:
import os
import numpy as np
import pandas as pd
import geopandas as gpd
import tqdm as tqdm
import glob
from utils.landcover import *

# Configurations

In [2]:
# Only editable variables:
# Relative path to your local directory
PATH = "../../.."

* #### The users should NOT change anything in the code below here.

In [3]:
# Non-editable variables:
PATH_LULC="data/gee/landcover"
PATH_OUTPUT = "results/staticattributes/"

# Set the directory:
os.chdir(PATH)

# Import data
## Catchment boundaries

In [4]:
catchment_boundaries = gpd.read_file('data/shapefiles/estreams_catchments.shp')
catchment_boundaries.head()

Unnamed: 0,id,area_km2,outlet_lat,outlet_lng,name,area_offic,layer,path,Code,basin_id,area_calc,geometry
0,FR003159,37,47.488,7.393,A100003001,38.6,FR003159,C:/Users/nascimth/Documents/Thiago/Eawag/Pytho...,FR003159,FR003159,37.183,"POLYGON ((7.30374 47.49375, 7.30708 47.49375, ..."
1,FR003160,227,47.626,7.239,A105003001,233.0,FR003160,C:/Users/nascimth/Documents/Thiago/Eawag/Pytho...,FR003160,FR003160,226.962,"POLYGON ((7.22291 47.63458, 7.22374 47.63458, ..."
2,FR003161,14,47.586,7.384,A106000101,15.0,FR003161,C:/Users/nascimth/Documents/Thiago/Eawag/Pytho...,FR003161,FR003161,13.595,"POLYGON ((7.38791 47.59041, 7.39874 47.59041, ..."
3,FR003162,70,47.622,7.275,A107020001,70.0,FR003162,C:/Users/nascimth/Documents/Thiago/Eawag/Pytho...,FR003162,FR003162,70.152,"POLYGON ((7.28375 47.60958, 7.28291 47.60958, ..."
4,FR003163,330,47.653,7.265,A108003001,325.0,FR003163,C:/Users/nascimth/Documents/Thiago/Eawag/Pytho...,FR003163,FR003163,330.158,"POLYGON ((7.22958 47.65291, 7.23208 47.65291, ..."


In [5]:
print("The total number of catchments to be processed are:", len(catchment_boundaries))

The total number of catchments to be processed are: 1972


# Reproject to projected coordinates system

In [6]:
# Define the target CRS to ETRS89 LAEA
target_crs = 'EPSG:3035' 

# Reproject the GeoDataFrame to the target CRS
catchment_boundaries_reprojected = catchment_boundaries.to_crs(target_crs)

## GEE outputs

In [7]:
# Check the files in the subdirectory:
filenames = glob.glob(PATH_LULC + "/*.csv")
filenames

['data/gee/landcover/EStreams_corine_2000_gee_1600_1649.csv',
 'data/gee/landcover/EStreams_corine_1990_gee_1100_1149.csv',
 'data/gee/landcover/EStreams_corine_2012_gee_1200_1249.csv',
 'data/gee/landcover/EStreams_corine_1990_gee_1600_1649.csv',
 'data/gee/landcover/EStreams_corine_2000_gee_1100_1149.csv',
 'data/gee/landcover/EStreams_corine_2012_gee_1500_1549.csv',
 'data/gee/landcover/EStreams_corine_2012_gee_500_999.csv',
 'data/gee/landcover/EStreams_corine_1990_gee_0_499.csv',
 'data/gee/landcover/EStreams_corine_1990_gee_1800_1849.csv',
 'data/gee/landcover/EStreams_corine_2006_gee_1250_1299.csv',
 'data/gee/landcover/EStreams_corine_2006_gee_1550_1599.csv',
 'data/gee/landcover/EStreams_corine_2000_gee_1800_1849.csv',
 'data/gee/landcover/EStreams_corine_2006_gee_1400_1449.csv',
 'data/gee/landcover/EStreams_corine_2000_gee_1950_1999.csv',
 'data/gee/landcover/EStreams_corine_1990_gee_1950_1999.csv',
 'data/gee/landcover/EStreams_corine_2018_gee_0_499.csv',
 'data/gee/landcov

In [8]:
# First we create an empty dataframe for the data:
landcover_df = pd.DataFrame()

# Loop for reading and concatenating the data:
for file in tqdm.tqdm(filenames):
    
    # First we read our data:
    landcover_file = pd.read_csv(file)
    landcover_file.drop(["system:index", ".geo"], axis = 1, inplace = True)
    landcover_file["class_name"] = "lulc_" + landcover_file["year"].astype(str) + "_" + landcover_file["class"].astype(str)
    year = landcover_file.loc[0, "year"]
    
    # Here we can create a pivot-table to organize our dataset:
    landcover_pivot = pd.pivot_table(
        landcover_file,
        values='area_sqm',          
        index='code',               # Rows are based on 'code'
        columns='class_name',       # Columns are based on 'class_name'
        fill_value=np.nan)
    
    # Total are per year:
    landcover_pivot["tot_area_"+str(year)] = landcover_pivot.sum(axis = 1)
    landcover_pivot.iloc[:, :-1] = landcover_pivot.iloc[:, :-1].div(landcover_pivot["tot_area_"+str(year)], axis=0)
    
    # Now we proceed with the concatenation:
    landcover_df = pd.concat([landcover_df, landcover_pivot], axis=1)
    
    # Here we deal with the case we have more than one file for the same year:
    landcover_df = landcover_df.T.groupby(level=0).apply(lambda group: group.ffill().bfill().iloc[0]).T

100%|███████████████████████████████████████████| 98/98 [00:02<00:00, 37.69it/s]


In [9]:
# Here we add the majority class for each basin:
landcover_df = pd.concat([landcover_df, landcover_df.apply(get_majority_columns, axis=1)], axis=1)

landcover_df

Unnamed: 0_level_0,lulc_1990_111,lulc_1990_112,lulc_1990_121,lulc_1990_122,lulc_1990_123,lulc_1990_124,lulc_1990_131,lulc_1990_132,lulc_1990_133,lulc_1990_141,...,tot_area_1990,tot_area_2000,tot_area_2006,tot_area_2012,tot_area_2018,lulc_dom_2000,lulc_dom_2012,lulc_dom_2006,lulc_dom_2018,lulc_dom_1990
code,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
FR003265,0.001019,0.042179,0.006320,0.000762,0.0,0.002412,0.003760,0.000783,0.000623,0.000403,...,6.822673e+09,6.822673e+09,6.822673e+09,6.822673e+09,6.822673e+09,231,231,231,231,312
FR003331,0.000541,0.026790,0.003732,0.000336,0.0,0.001042,0.002105,0.000441,0.000000,0.000105,...,6.526487e+09,6.526487e+09,6.526487e+09,6.526487e+09,6.526487e+09,311,311,311,311,311
FR003335,0.000725,0.029109,0.003868,0.000374,0.0,0.001065,0.001803,0.000401,0.000000,0.000204,...,7.781937e+09,7.781937e+09,7.781937e+09,7.781937e+09,7.781937e+09,311,311,311,311,311
FR003426,0.000322,0.017936,0.001852,0.000223,0.0,0.000205,0.000571,0.000000,0.000000,0.000061,...,8.642716e+09,8.642716e+09,8.642716e+09,8.642716e+09,8.642716e+09,211,211,211,211,211
FR003501,0.000267,0.021832,0.002553,0.000287,0.0,0.001092,0.000687,0.000000,0.000121,0.000106,...,8.777684e+09,8.777684e+09,8.777684e+09,8.777684e+09,8.777684e+09,211,211,211,211,211
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
HR000310,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,...,2.589564e+08,2.589564e+08,2.589564e+08,2.589564e+08,2.589564e+08,311,311,311,311,311
HR000312,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,...,4.332656e+08,4.332656e+08,4.332656e+08,4.332656e+08,4.332656e+08,311,311,311,311,231
HR000313,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,...,2.467267e+08,2.467267e+08,2.467267e+08,2.467267e+08,2.467267e+08,311,311,311,311,311
HR000315,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,...,4.578434e+08,4.578434e+08,4.578434e+08,4.578434e+08,4.578434e+08,311,311,311,311,231


In [10]:
# Here we add the percentage of each catchment area covered by the Corine (there are countries not covered)
columns_tot_areas = ["tot_area_1990", "tot_area_2000", "tot_area_2006", "tot_area_2012", "tot_area_2018"]

landcover_df.loc[:, columns_tot_areas] = landcover_df.loc[:, columns_tot_areas].div(catchment_boundaries_reprojected.set_index("basin_id").area, axis=0)
landcover_df

Unnamed: 0_level_0,lulc_1990_111,lulc_1990_112,lulc_1990_121,lulc_1990_122,lulc_1990_123,lulc_1990_124,lulc_1990_131,lulc_1990_132,lulc_1990_133,lulc_1990_141,...,tot_area_1990,tot_area_2000,tot_area_2006,tot_area_2012,tot_area_2018,lulc_dom_2000,lulc_dom_2012,lulc_dom_2006,lulc_dom_2018,lulc_dom_1990
code,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
FR003265,0.001019,0.042179,0.006320,0.000762,0.0,0.002412,0.003760,0.000783,0.000623,0.000403,...,0.999994,0.999994,0.999994,0.999994,0.999994,231,231,231,231,312
FR003331,0.000541,0.026790,0.003732,0.000336,0.0,0.001042,0.002105,0.000441,0.000000,0.000105,...,0.999982,0.999982,0.999982,0.999982,0.999982,311,311,311,311,311
FR003335,0.000725,0.029109,0.003868,0.000374,0.0,0.001065,0.001803,0.000401,0.000000,0.000204,...,0.999984,0.999984,0.999984,0.999984,0.999984,311,311,311,311,311
FR003426,0.000322,0.017936,0.001852,0.000223,0.0,0.000205,0.000571,0.000000,0.000000,0.000061,...,0.999992,0.999992,0.999992,0.999992,0.999992,211,211,211,211,211
FR003501,0.000267,0.021832,0.002553,0.000287,0.0,0.001092,0.000687,0.000000,0.000121,0.000106,...,0.999993,0.999993,0.999993,0.999993,0.999993,211,211,211,211,211
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
HR000310,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.999942,0.999942,0.999942,0.999942,0.999942,311,311,311,311,311
HR000312,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.999958,0.999958,0.999958,0.999958,0.999958,311,311,311,311,231
HR000313,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.999949,0.999949,0.999949,0.999949,0.999949,311,311,311,311,311
HR000315,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.999954,0.999954,0.999954,0.999954,0.999954,311,311,311,311,231


In [11]:
# Here we sort the index:
landcover_df = landcover_df.sort_index(axis=0)
landcover_df.index.name = "basin_id"
landcover_df

Unnamed: 0_level_0,lulc_1990_111,lulc_1990_112,lulc_1990_121,lulc_1990_122,lulc_1990_123,lulc_1990_124,lulc_1990_131,lulc_1990_132,lulc_1990_133,lulc_1990_141,...,tot_area_1990,tot_area_2000,tot_area_2006,tot_area_2012,tot_area_2018,lulc_dom_2000,lulc_dom_2012,lulc_dom_2006,lulc_dom_2018,lulc_dom_1990
basin_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
FR003159,0.000000,0.037550,0.000000,0.000000,0.0,0.00000,0.000000,0.0,0.000000,0.0,...,0.999824,0.999824,0.999824,0.999824,0.999824,313,313,313,313,311
FR003160,0.000319,0.058053,0.002711,0.000000,0.0,0.00000,0.000000,0.0,0.000000,0.0,...,0.999929,0.999929,0.999929,0.999929,0.999929,211,211,211,211,211
FR003161,0.000000,0.035716,0.000000,0.000000,0.0,0.00000,0.000000,0.0,0.000000,0.0,...,0.999941,0.999941,0.999941,0.999941,0.999941,211,211,211,211,211
FR003162,0.000000,0.033646,0.000000,0.000000,0.0,0.00000,0.000000,0.0,0.000000,0.0,...,1.000033,1.000033,1.000033,1.000033,1.000033,211,211,211,211,211
FR003163,0.000818,0.054628,0.003029,0.000000,0.0,0.00000,0.001090,0.0,0.000000,0.0,...,0.999967,0.999967,0.999967,0.999967,0.999967,211,211,211,211,211
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
HR000313,0.000000,0.000000,0.000000,0.000000,0.0,0.00000,0.000000,0.0,0.000000,0.0,...,0.999949,0.999949,0.999949,0.999949,0.999949,311,311,311,311,311
HR000314,0.000000,0.000000,0.000000,0.000000,0.0,0.00000,0.000000,0.0,0.000000,0.0,...,0.999920,0.999920,0.999920,0.999920,0.999920,311,311,311,311,311
HR000315,0.000000,0.000000,0.000000,0.000000,0.0,0.00000,0.000000,0.0,0.000000,0.0,...,0.999954,0.999954,0.999954,0.999954,0.999954,311,311,311,311,231
HR000316,0.000000,0.000000,0.000000,0.000000,0.0,0.00000,0.000000,0.0,0.000000,0.0,...,0.999945,0.999945,0.999945,0.999945,0.999945,311,311,311,311,231


In [12]:
# Round the data to 3 decimals:
landcover_df.iloc[:, 0:-5] = landcover_df.iloc[:, 0:-5].astype(float).round(3)
landcover_df

Unnamed: 0_level_0,lulc_1990_111,lulc_1990_112,lulc_1990_121,lulc_1990_122,lulc_1990_123,lulc_1990_124,lulc_1990_131,lulc_1990_132,lulc_1990_133,lulc_1990_141,...,tot_area_1990,tot_area_2000,tot_area_2006,tot_area_2012,tot_area_2018,lulc_dom_2000,lulc_dom_2012,lulc_dom_2006,lulc_dom_2018,lulc_dom_1990
basin_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
FR003159,0.000,0.038,0.000,0.000,0.0,0.000,0.000,0.0,0.000,0.0,...,1.0,1.0,1.0,1.0,1.0,313,313,313,313,311
FR003160,0.000,0.058,0.003,0.000,0.0,0.000,0.000,0.0,0.000,0.0,...,1.0,1.0,1.0,1.0,1.0,211,211,211,211,211
FR003161,0.000,0.036,0.000,0.000,0.0,0.000,0.000,0.0,0.000,0.0,...,1.0,1.0,1.0,1.0,1.0,211,211,211,211,211
FR003162,0.000,0.034,0.000,0.000,0.0,0.000,0.000,0.0,0.000,0.0,...,1.0,1.0,1.0,1.0,1.0,211,211,211,211,211
FR003163,0.001,0.055,0.003,0.000,0.0,0.000,0.001,0.0,0.000,0.0,...,1.0,1.0,1.0,1.0,1.0,211,211,211,211,211
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
HR000313,0.000,0.000,0.000,0.000,0.0,0.000,0.000,0.0,0.000,0.0,...,1.0,1.0,1.0,1.0,1.0,311,311,311,311,311
HR000314,0.000,0.000,0.000,0.000,0.0,0.000,0.000,0.0,0.000,0.0,...,1.0,1.0,1.0,1.0,1.0,311,311,311,311,311
HR000315,0.000,0.000,0.000,0.000,0.0,0.000,0.000,0.0,0.000,0.0,...,1.0,1.0,1.0,1.0,1.0,311,311,311,311,231
HR000316,0.000,0.000,0.000,0.000,0.0,0.000,0.000,0.0,0.000,0.0,...,1.0,1.0,1.0,1.0,1.0,311,311,311,311,231


# Data export

In [13]:
# Export the final dataset:
landcover_df.to_csv(PATH_OUTPUT+"estreams_landcover_attributes.csv")

# End