# Landcover dataset extraction

Author: Martina Kauzlaric (martina.kauzlaric@unibe.ch)

This notebook is used to retrieve and concatenate the landcover dataset into a table for publication alongisde the used data.

## Requirements
**Python:**

* Python=3.13.2
* Jupyter
* os
* numpy=2.2.4
* xarray=2024.11.0
* pandas=2.2.3
* geopandas=1.0.1
* tqdm=4.67.1

Check the Github repository for an environment_lancover.yml and environment_camels_chem_landcover_dwnlCLMSdata.yml (here for downloading additionally the data via CLMS API) for conda environments file.

**Files:**

* ?


**Directory:**

* Clone the GitHub directory locally
* Place any third-data variables in their respective directory.
* ONLY update the "PATH" variable in the section "Configurations", with their relative path to the EStreams directory. 


## References
* https://land.copernicus.eu/en/products/corine-land-cover
## Observations
* Part of the data is interpolated. 

# Import modules

In [None]:
# Clear all variables
%reset -f
#Import necessary libraries
import os
import glob
import numpy as np
import xarray as xr
import matplotlib.pyplot as plt
import geopandas as gpd
import pandas as pd
from shapely.geometry import MultiPolygon
from shapely.geometry import box
import tqdm as tqdm

# Configurations

In [16]:
# Only editable variables:
# Set (relative) path to your local directory
# PATH = ".."
PATH = "S:\\CAMELS-CH\\CAMELS-chem"

In [None]:
## Set directories
GIS_dir = os.path.join(PATH,"data\\GIS")
# Define shapefile with the catchments
catchments_shp = os.path.join(GIS_dir,"shapefile_catchments\\camels_ch_chem_catchment_boundaries.shp")
#Add subfolder to GIS_dir for CORINE Landcover data
GIS_dir = os.path.join(GIS_dir, "CORINE_Landcover")  
PATH_OUTPUT = os.path.join(PATH,"results\\catchment_aggregated_data\\landcover")

# Create the directories if they do not exist
# Note: the directories are created in the order they are listed here, so if you want to change the structure, do it here.
if not os.path.isdir(GIS_dir):
    os.makedirs(GIS_dir, exist_ok=True)

if not os.path.isdir(os.path.join(PATH, "results")):
    os.makedirs(os.path.join(PATH, "results"), exist_ok=True)

if not os.path.isdir(os.path.join(PATH, "results\\catchment_aggregated_data")):
    os.makedirs(os.path.join(PATH, "results\\catchment_aggregated_data"), exist_ok=True)

if not os.path.isdir(PATH_OUTPUT):
    os.makedirs(PATH_OUTPUT, exist_ok=True)

##Change to directory to where you want to store the results    
os.chdir(PATH_OUTPUT)

In [18]:
os.getcwd()

'S:\\CAMELS-CH\\CAMELS-chem\\results\\catchment_aggregated_data\\landuse'

# Download CORINE Landcover data
* Here following the code to autmomatically download the Landcover data
* Note: you nned to be regostered on CLMS and create an API token, plese refer to https://land.copernicus.eu/en/how-to-guides/how-to-download-spatial-data/how-to-create-api-tokens
* after you created you private key please follow the steps here below
* The user and alternatively also download manually the data under https://land.copernicus.eu/en/products/corine-land-cover
=> then skip this part and gp to import data!

## Requirements additionally to those listed above
**Python:**

* pyjwt= 2.10.1
* cryptography = 44.0.1
* fiona = 1.10.1

In [5]:
#Additional libraries to be uploaded if not already installed
import json
import requests
import time
import jwt  # PyJWT
from cryptography.hazmat.primitives import serialization
import zipfile
import fiona

In [155]:
# 1. Load your token using the private key JSON file
# Load credentials from file
with open(r"S:\CAMELS-CH\CAMELS-chem\privatekey_API_CMLS.json", "r") as f:
    creds = json.load(f)

# Prepare the JWT (JSON Web Token)
now = int(time.time())
payload = {
    "iss": creds["client_id"],
    "sub": creds["user_id"],
    "aud": creds["token_uri"],
    "iat": now,
    "exp": now + 3600,  # expires in 1 hour
}

In [156]:
private_key = serialization.load_pem_private_key(
    creds["private_key"].encode(), password=None
)

jwt_token = jwt.encode(payload, private_key, algorithm="RS256")

# Request the access token
response = requests.post(
    creds["token_uri"],
    data={"grant_type": "urn:ietf:params:oauth:grant-type:jwt-bearer", "assertion": jwt_token},
)

if response.ok:
    token = response.json()["access_token"]
    print("✅ Access token successfully retrieved!")
else:
    raise Exception(f"❌ Failed to get access token: {response.status_code} {response.text}")

✅ Access token successfully retrieved!


Be careful:
https://eea.github.io/clms-api-docs/download.html#download-prepackaged-files

BoundingBox”: [max.lat,max.lon,min.lat,min.lon] which is the same as [N,E,S,W]
Note: Longitude is typically represented by the X-coordinate, and Latitude is represented by the Y-coordinate

In [6]:
# 2. Load and reproject catchment shapefile
catchments = gpd.read_file(catchments_shp)
#catchments_3035 = catchments.to_crs("EPSG:3035")
# Convert geometry to GeoJSON format (MultiPolygon)
#bounds = catchments_3035.total_bounds  # [minx, miny, maxx, maxy]
#geometry_json = catchments_3035.geometry.union_all().__geo_interface__
# Transform the geometry back to WGS84 (EPSG:4326) 
# => this is the projection needed to download the data by a bounding box!!
catchments_wgs84 = catchments.to_crs("EPSG:4326")
bounds = catchments_wgs84.total_bounds
# Define the buffer amount in decimal degrees (we add some buffer to the bounds)
# Note: the buffer is in decimal degrees, so 0.1 =~ 10 km
buffer = 0.1

# Add buffer to the bounds
buffered_bounds = [
    bounds[0] - buffer,  # minx - buffer
    bounds[1] - buffer,  # miny - buffer
    bounds[2] + buffer,  # maxx + buffer
    bounds[3] + buffer,  # maxy + buffer
]

# Convert NumPy float64 to standard Python float
buffered_bounds = [float(x) for x in buffered_bounds]

In [158]:
print("Bounds:", bounds)

Bounds: [ 5.90351684 45.73174765 10.4920484  48.02963712]


In [159]:
print("Buffered bounds:", buffered_bounds)

Buffered bounds: [5.803516841915284, 45.63174764839809, 10.592048402439152, 48.12963712004709]


In [160]:
# 3. Setup the base request
url = "https://land.copernicus.eu/api/@datarequest_post"
headers = {
    "Accept": "application/json",
    "Content-Type": "application/json",
    "Authorization": f"Bearer {token}",
}

You find the **UID** (here *DataseID*) and the **@id** (here *DownloadID*) for the different datasets on https://eea.github.io/clms-api-docs/download.html#download-prepackaged-files
under **Find the items to be downloaded**

In [None]:
# 4. Make list with dataset info by year (UID = Dataset ID, ID = Vector GDB ID)
datasets = {
    "2000": {
        "DatasetID": "6704f90ca82e4f228a46111519f8978e",
        "DownloadID": "1009310e-2cd8-481c-b15b-aee3f0406098"
    },
    "2006": {
        "DatasetID": "d443c86fec2f49e08ff12c7decdbf2af",
        "DownloadID": "3936f6a5-9157-4e76-9fc7-4e14668c81ef"
    },
    "2012": {
        "DatasetID": "a5ee71470be04d66bcff498f94ceb5dc",
        "DownloadID": "cff14ee5-bafb-46f4-a1b2-2cd6f4049514"
    },
    "2018": {
        "DatasetID": "0407d497d3c44bcd93ce8fd5bf78596a",
        "DownloadID": "1bda2fbd-3230-42ba-98cf-69c96ac063bc"
    },
}

In [None]:
# 5. #Define polling function to check if the task is completed and
# loop over each year
# # Note:this first part is not so important/optional, but it is good to keep track of the progress of the download
#   the requests will be sent anyway (even if the download takes longer than 10 minutes)
def poll_task_status(task_id, headers, max_wait=600, interval=10):
    """Poll Copernicus API until the task is ready or timeout."""
    task_url = f"https://land.copernicus.eu/api/@tasks/{task_id}"
    waited = 0

    while waited < max_wait:
        response = requests.get(task_url, headers=headers)
        if response.ok:
            status = response.json().get("Status", "").lower()
            if status == "completed":
                download_url = response.json().get("DownloadUrl")
                return download_url
            elif status == "failed":
                raise RuntimeError(f"Task {task_id} failed.")
        time.sleep(interval)
        waited += interval

    raise TimeoutError(f"Task {task_id} did not complete in time.") 

for year, ids in datasets.items():
    payload = {
        "Datasets": [ {
            "DatasetID": ids["DatasetID"],
            "DatasetDownloadInformationID": ids["DownloadID"],
            "OutputFormat": "GDB",
            "OutputGCS": "EPSG:3035",
            "BoundingBox": list(buffered_bounds)  # "BoundingBox":  must be in WGS84!!
        }]
    }

    response = requests.post(url, headers=headers, json=payload)

    if response.status_code == 201:
        task_id = response.json()["TaskIds"][0]["TaskID"]
        print(f"[{year}] Submitted task {task_id}, polling for result...")

        try:
            download_url = poll_task_status(task_id, headers)
            if download_url:
                print(f"[{year}] Ready: {download_url}")
                response_file = requests.get(download_url)
                out_path = os.path.join(PATH_OUTPUT, f"CLC_{year}.zip")
                with open(out_path, "wb") as f:
                    f.write(response_file.content)
                print(f"[{year}] Downloaded to: {out_path}")
            else:
                print(f"[{year}] No download URL found after polling.")
        except Exception as e:
            print(f"[{year}] Error: {e}")
    else:
        print(f"[{year}] Failed request: {response.status_code} - {response.text}")

[2000] Submitted task 53895001815, polling for result...
[2000] Error: Task 53895001815 did not complete in time.
[2006] Submitted task 96036197586, polling for result...
[2006] Error: Task 96036197586 did not complete in time.
[2012] Submitted task 13820436744, polling for result...
[2012] Error: Task 13820436744 did not complete in time.
[2018] Submitted task 19954247833, polling for result...
[2018] Error: Task 19954247833 did not complete in time.


**Note: data might be queued,depending on the load in the CLMS download process**

 *this can last 5-10min for a region like Switzerland (hydrological Switzerland is about 58'000 km^2)
 Note: If it last longer it means you might have to adapt the length of expiry duration defined above (exp) or split the download process and regenerate the access token!*

 **Once you receive the email confirming the download is ready, download the data and save them in the GIS_dir directory**

In [163]:
# 6. Unzip the downloaded files
import zipfile

# Folder to extract contents
extract_dir = os.path.join(GIS_dir, "CLC_downloads")
os.makedirs(extract_dir, exist_ok=True)

# Loop through and unzip
for file in os.listdir(GIS_dir):
    if file.endswith(".zip"):
        zip_path = os.path.join(GIS_dir, file)
        extract_path = os.path.join(extract_dir, file.replace(".zip", ""))
        print("Zip path:", zip_path)
        print("Extr path:", extract_path)
        os.makedirs(extract_path, exist_ok=True)

        with zipfile.ZipFile(zip_path, 'r') as zip_ref:
            zip_ref.extractall(extract_path)
        print(f"✅ Unzipped: {file}")

Zip path: S:\CAMELS-CH\CAMELS-chem\data\GIS\CORINE_Landuse\175369.zip
Extr path: S:\CAMELS-CH\CAMELS-chem\data\GIS\CORINE_Landuse\CLC_downloads\175369
✅ Unzipped: 175369.zip
Zip path: S:\CAMELS-CH\CAMELS-chem\data\GIS\CORINE_Landuse\175377.zip
Extr path: S:\CAMELS-CH\CAMELS-chem\data\GIS\CORINE_Landuse\CLC_downloads\175377
✅ Unzipped: 175377.zip
Zip path: S:\CAMELS-CH\CAMELS-chem\data\GIS\CORINE_Landuse\175386.zip
Extr path: S:\CAMELS-CH\CAMELS-chem\data\GIS\CORINE_Landuse\CLC_downloads\175386
✅ Unzipped: 175386.zip
Zip path: S:\CAMELS-CH\CAMELS-chem\data\GIS\CORINE_Landuse\175392.zip
Extr path: S:\CAMELS-CH\CAMELS-chem\data\GIS\CORINE_Landuse\CLC_downloads\175392
✅ Unzipped: 175392.zip


In [164]:
#List zipped files in GIS_dir
zipdirs = os.listdir(GIS_dir)
zipfiles = [file for file in zipdirs if file.endswith(".zip")]
print("Zip files:", zipfiles)

Zip files: ['175369.zip', '175377.zip', '175386.zip', '175392.zip']


Let us free some space

In [165]:
# Delete each zip file
for zipfile in zipfiles:
    zip_path = os.path.join(GIS_dir, zipfile)
    os.remove(zip_path)
    print(f"✅ Deleted: {zipfile}")

✅ Deleted: 175369.zip
✅ Deleted: 175377.zip
✅ Deleted: 175386.zip
✅ Deleted: 175392.zip


The downloaded data are .gdb, so we need to extract the data (*layer*) to a shapefile:

In [166]:
# Loop through unzipped folders
for root, dirs, files in os.walk(extract_dir):
    for dir_name in dirs:
        if dir_name.endswith(".gdb"):
            gdb_path = os.path.join(root, dir_name)
            print(f"📂 GDB directory found: {gdb_path}")
            # List layers
            layers = fiona.listlayers(gdb_path)
            print(f"  📄 Available layers: {layers}")

            # Export each layer to shapefile
            for layer in layers:
                gdf = gpd.read_file(gdb_path, layer=layer)
                out_shp = os.path.join(GIS_dir, f"{layer}.shp")
                gdf.to_file(out_shp)
                print(f"✅ Exported: {out_shp}")

📂 GDB directory found: S:\CAMELS-CH\CAMELS-chem\data\GIS\CORINE_Landuse\CLC_downloads\175369\Results\U2006_CLC2000_V2020_20u1.gdb
  📄 Available layers: ['U2006_CLC2000_V2020_20u1']


  gdf.to_file(out_shp)
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(


✅ Exported: S:\CAMELS-CH\CAMELS-chem\data\GIS\CORINE_Landuse\U2006_CLC2000_V2020_20u1.shp
📂 GDB directory found: S:\CAMELS-CH\CAMELS-chem\data\GIS\CORINE_Landuse\CLC_downloads\175377\Results\U2012_CLC2006_V2020_20u1.gdb
  📄 Available layers: ['U2012_CLC2006_V2020_20u1']


  gdf.to_file(out_shp)
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(


✅ Exported: S:\CAMELS-CH\CAMELS-chem\data\GIS\CORINE_Landuse\U2012_CLC2006_V2020_20u1.shp
📂 GDB directory found: S:\CAMELS-CH\CAMELS-chem\data\GIS\CORINE_Landuse\CLC_downloads\175386\Results\U2018_CLC2012_V2020_20u1.gdb
  📄 Available layers: ['U2018_CLC2012_V2020_20u1']


  gdf.to_file(out_shp)
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(


✅ Exported: S:\CAMELS-CH\CAMELS-chem\data\GIS\CORINE_Landuse\U2018_CLC2012_V2020_20u1.shp
📂 GDB directory found: S:\CAMELS-CH\CAMELS-chem\data\GIS\CORINE_Landuse\CLC_downloads\175392\Results\U2018_CLC2018_V2020_20u1.gdb
  📄 Available layers: ['U2018_CLC2018_V2020_20u1']


  gdf.to_file(out_shp)
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(


✅ Exported: S:\CAMELS-CH\CAMELS-chem\data\GIS\CORINE_Landuse\U2018_CLC2018_V2020_20u1.shp


  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(


* #### The users should NOT change anything in the code below here. 

# Import data
* Load catchments and look at full table

*Note: Run the next two lines only if you downloaded the CORINE data manually (and didn't load yet the catchments shape file)*

In [25]:
catchments = gpd.read_file(catchments_shp)
catchments_3035 = catchments.to_crs("EPSG:3035")

In [8]:
catchments["bafu_id"] = catchments["gauge_id"]
catchments

Unnamed: 0,gauge_id,sensor_id,nawaf_id,nawat_id,isot_id,chirp_id,gauge_name,water_body,gauge_east,gauge_nort,gauge_lon,gauge_lat,area,area_swiss,geometry,bafu_id
0,2009,2009.0,1837.0,1837.0,NIO04,,Porte du Scex,Rhône,557660,133280,6.89,46.35,5239.4,99.994914,"POLYGON Z ((2674253.038 1167429.881 0, 2674340...",2009
1,2011,2011.0,,4070.0,,,Sion,Rhône,593770,118630,7.36,46.22,3372.4,100.000000,"POLYGON Z ((2674253.038 1167429.881 0, 2674340...",2011
2,2016,2016.0,1833.0,1833.0,NIO02,,Brugg,Aare,657000,259360,8.19,47.48,11681.3,100.000000,"POLYGON Z ((2655969.68 1259695.589 0, 2655976....",2016
3,2018,2018.0,1835.0,1339.0,,,Mellingen,Reuss,662830,252580,8.27,47.42,3385.8,100.000000,"POLYGON Z ((2663723.38 1252919.068 0, 2663794....",2018
4,2019,2019.0,,1852.0,NIO01,,Brienzwiler,Aare,649930,177380,8.09,46.75,555.2,100.000000,"POLYGON Z ((2669196.412 1183579.51 0, 2669203....",2019
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
110,2617,2617.0,,,,,Müstair,Rom,830800,168700,10.45,46.63,128.6,42.552175,"POLYGON Z ((2820942.826 1171469.984 0, 2820953...",2617
111,2623,2623.0,,,,,Oberwald,Rhone,669900,154075,8.35,46.53,93.3,100.000000,"POLYGON Z ((2674253.038 1167429.881 0, 2674340...",2623
112,2634,2634.0,6169.0,1181.0,,,Emmen,Kleine Emme,663700,213630,8.28,47.07,478.3,100.000000,"POLYGON Z ((2653429.237 1216261.807 0, 2653439...",2634
113,2635,2635.0,,,,,"Einsiedeln, Gross",Grossbach,700710,218125,8.77,47.11,8.9,100.000000,"POLYGON Z ((2701144.527 1218073.633 0, 2701261...",2635


Now we extract landcover attributes as area percentages as we did for CAMELS-CH (see also https://github.com/camels-ch/camels-ch/blob/main/landcover_attributes/corine_landcover_CH.R)

In [19]:
# Prepare the dictionary of CORINE data shapefiles in GIS_dir
import re

clc_by_year = {}

# Loop through files and extract year
for filename in os.listdir(GIS_dir):
    if filename.endswith(".shp"):
        # Try to find the year using regex
        match = re.search(r'CLC(20\d\d)', filename)
        if match:
            year = match.group(1)
            # Remove the .shp extension
            name = os.path.splitext(filename)[0]
            clc_by_year[year] = name

print("📁 Detected CORINE files by year:")
print(clc_by_year)

📁 Detected CORINE files by year:
{'2000': 'U2006_CLC2000_V2020_20u1', '2006': 'U2012_CLC2006_V2020_20u1', '2012': 'U2018_CLC2012_V2020_20u1', '2018': 'U2018_CLC2018_V2020_20u1'}


Now we define helper functions we nned to extract and reclassify the data

In [None]:
# --- Helper Functions ---

def reclass_clc(clc, code_column=None):
    """Reclassify CORINE land cover codes into CAMELS-CH land use categories."""
    reclass_dict = {
        111: "urban_perc", 112: "urban_perc", 121: "urban_perc", 122: "urban_perc", 123: "urban_perc", 124: "urban_perc",
        131: "loose_rock_perc", 132: "loose_rock_perc", 133: "loose_rock_perc",
        141: "grass_perc", 142: "urban_perc",
        211: "crop_perc", 212: "crop_perc", 213: "crop_perc",
        221: "scrub_perc", 222: "scrub_perc", 223: "scrub_perc",
        231: "grass_perc",
        241: "crop_perc", 242: "crop_perc", 243: "crop_perc", 244: "scrub_perc",
        311: "dwood_perc", 312: "ewood_perc", 313: "mixed_wood_perc",
        321: "grass_perc", 322: "wetlands_perc", 323: "scrub_perc", 324: "scrub_perc",
        331: "loose_rock_perc", 332: "rock_perc", 333: "loose_rock_perc", 334: "loose_rock_perc", 335: "ice_perc",
        411: "wetlands_perc", 412: "wetlands_perc", 421: "wetlands_perc", 422: "wetlands_perc", 423: "wetlands_perc",
        511: "inwater_perc", 512: "inwater_perc", 521: "inwater_perc", 522: "inwater_perc", 523: "inwater_perc",
        999: "blank_perc", 990: "blank_perc", 995: "inwater_perc"
    }
    # Convert string codes to integers for mapping
    clc["reclass"] = clc[code_column].astype(int).map(reclass_dict).fillna("na")
    return clc

def clip_clc_to_catchments(catchments, clc, code_col="reclass", id_col="gauge_id"):
    """Clip land cover data to catchments and aggregate area per reclassified class."""
    df_all = pd.DataFrame()

    for i in tqdm.tqdm(range(len(catchments)), desc="Processing catchments"):
        catch_i = catchments.iloc[[i]]
        catch_id = catch_i[id_col].values[0]

        try:
            clc_i = gpd.overlay(clc, catch_i, how="intersection")
        except Exception:
            continue

        if clc_i.empty:
            continue

        clc_i["area"] = clc_i.geometry.area
        clc_agg = clc_i.groupby(code_col)["area"].sum().reset_index()
        clc_agg.columns = [code_col, catch_id]

        if df_all.empty:
            df_all = clc_agg
        else:
            df_all = pd.merge(df_all, clc_agg, on=code_col, how="outer")

    return df_all

def calculate_percentage_table(area_df, catchments):
    """Convert area table to percentage based on catchment area."""
    area_df = area_df.set_index("reclass")
    area_df = area_df.fillna(0)
    catchment_areas = pd.Series(catchments.geometry.area.values, index=catchments["gauge_id"].values)
    percentage_df = area_df.copy()
    for col in area_df.columns:
        percentage_df[col] = 100 * area_df[col] / catchment_areas[col]
    return percentage_df

def determine_dominant_class(percentage_df):
    """Create static attribute table with dominant land cover class per catchment."""
    dominant_class = percentage_df.idxmax()
    static_df = percentage_df.T.copy()
    static_df["dom_land_cover"] = dominant_class
    return static_df

In [None]:
# --- Main Processing ---
#Preallocate table for static attributes
all_static_tables = {}

for year, filename in clc_by_year.items():
    print(f"\n🌍 Processing year {year}...")

    # Load and reclassify
    clc_fp = os.path.join(GIS_dir, f"{filename}.shp")
    clc = gpd.read_file(clc_fp)
    #clc = clc.to_crs("EPSG:3035")
    # Automatically detect code column
    code_col = next((col for col in clc.columns if col.lower().startswith("code_")), "Code")
    # Reclassify based on correct column
    clc = reclass_clc(clc, code_column=code_col)

    # Clip and aggregate
    clipped_area_df = clip_clc_to_catchments(catchments_3035, clc)
    percent_df = calculate_percentage_table(clipped_area_df, catchments_3035)

    # Save percentage table
    percent_df.index.name = "gauge_id"
    percent_df.T.to_csv(f"clc_{year}_perc.csv", sep=";", float_format="%.2f")

    # Static table
    static_df = determine_dominant_class(percent_df)
    all_static_tables[year] = static_df

# Save final static attribute table from 2000
final_static = all_static_tables["2000"]
final_static.index.name = "gauge_id"
final_static = final_static.reset_index()

# Reorder columns to match R output (if needed)
columns_order = ['gauge_id', 'urban_perc', 'loose_rock_perc', 'grass_perc', 'crop_perc',
                 'scrub_perc', 'dwood_perc', 'ewood_perc', 'mixed_wood_perc', 'wetlands_perc',
                 'rock_perc', 'ice_perc', 'inwater_perc', 'blank_perc', 'dom_land_cover']
final_static = final_static[[col for col in columns_order if col in final_static.columns]]

# Save
final_static.to_csv("CAMELS_CH_landcover_attributes.csv", sep=";", float_format="%.2f", index=False)


🌍 Processing year 2000...


Processing catchments: 100%|██████████| 115/115 [21:03<00:00, 10.98s/it]



🌍 Processing year 2006...


Processing catchments: 100%|██████████| 115/115 [17:55<00:00,  9.36s/it]



🌍 Processing year 2012...


Processing catchments: 100%|██████████| 115/115 [17:42<00:00,  9.24s/it] 



🌍 Processing year 2018...


Processing catchments: 100%|██████████| 115/115 [17:57<00:00,  9.37s/it] 


Now we have our 6-yearly landcover for all catchments together with a static landcover, with 2000 as reference year (which is more or less in the middle, if we consider the full range of data spans between 1980 and 2021).

Finally, we interpolate linearly between the available years and also generate a file per catchment (similarly to what we did for CAMELS-CH, refer to https://github.com/camels-ch/camels-ch/blob/main/landcover_attributes/annual_timeserie_CH.R).

In [None]:
# Create a new directory for the interpolated time series
interpolated_dir = os.path.join(PATH_OUTPUT, "annual_timeseries")
os.makedirs(interpolated_dir, exist_ok=True)

# Automatically detect available years based on filenames
clc_files = [f for f in os.listdir(PATH_OUTPUT) if f.startswith("clc_") and f.endswith("_perc.csv")]
clc_years = sorted([int(f.split("_")[1]) for f in clc_files])

# Load all available CLC data and store by year
clc_data_by_year = {}
for year in clc_years:
    df = pd.read_csv(f"clc_{year}_perc.csv", sep=";", index_col=0)
    clc_data_by_year[year] = df

In [None]:
# Generate a full range of years from the available years
full_years = list(range(clc_years[0], clc_years[-1] + 1))       

# Get all catchments and land cover classes
all_catchments = clc_data_by_year[2000].index.tolist()
landcover_classes = clc_data_by_year[2000].columns.tolist()

# Preallocate final dataframe: MultiIndex with (gauge_id, year)
index = pd.MultiIndex.from_product([all_catchments, full_years], names=["gauge_id", "year"])
# Uncomment the following line if you don't want to generate a file with annual timeseries for all catchments)
landcover_timeseries = pd.DataFrame(index=index, columns=landcover_classes)

In [None]:
# Interpolate for each catchment
for gauge in tqdm.tqdm(all_catchments, desc="Interpolating time series"):
    catchment_ts = pd.DataFrame(index=clc_years, columns=landcover_classes, dtype=float)
    catchment_ts.index = catchment_ts.index.astype(int)
    catchment_ts = catchment_ts.reindex(full_years)

    for year in clc_years:
        catchment_ts.loc[year] = clc_data_by_year[year].loc[gauge].astype(float)

    # Interpolate
    catchment_ts_interp = catchment_ts.astype(float).interpolate(method="linear", axis=0).reindex(full_years)

    # Align columns before assignment
    catchment_ts_interp = catchment_ts_interp[catchment_ts.columns]

    # Store (uncomment the following two lines if you don't want to generate a file with annual timeseries for all catchments)
    for year in full_years:
        landcover_timeseries.loc[(gauge, year), :] = catchment_ts_interp.loc[year]
    
    # Save individual CSV for each catchment
    catchment_ts_interp.index.name = "year"
    catchment_ts_interp.to_csv(os.path.join(interpolated_dir, f"CAMELS_CH_Chem_landcover_{gauge}_annual_timeseries.csv"), sep=";", float_format="%.2f")


Interpolating time series: 100%|██████████| 115/115 [00:01<00:00, 96.40it/s] 


Run the next lines of code if you want to generate a file with annual timeseries for all catchments, otherwise yu are done, yay!

In [93]:
# Reset index and save to file
landcover_timeseries = landcover_timeseries.reset_index()
output_filename = f"CAMELS_CH_Chem_landcover_annual_timeseries_{clc_years[0]}_{clc_years[-1]}.csv"
landcover_timeseries.to_csv(output_filename, sep=";", float_format="%.2f", index=False)

Adjust the name of the files

In [20]:
folder_2020 = "../results/landcover/annual_timeseries"
output_folder = "../results/Dataset/catchment_aggregated_data/landcover_data"
os.makedirs(output_folder, exist_ok=True)

for filename in os.listdir(folder_2020):
    if filename.startswith("CAMELS_CH_Chem_landcover_") and filename.endswith("_annual_timeseries.csv"):
        
        # Extract basin code
        parts = filename.split("_")
        basin_code = parts[4]

        # Load file
        input_path = os.path.join(folder_2020, filename)
        df = pd.read_csv(input_path, sep=";")
        df.columns = ['date', 'crop_perc', 'dwood_perc', 'ewood_perc', 'grass_perc',
       'ice_perc', 'inwater_perc', 'loose_rock_perc', 'mixed_wood_perc',
       'rock_perc', 'scrub_perc', 'urban_perc', 'wetlands_perc'] 
        # Repeat the last row (assumed to be 2018) for 2019 and 2020
        if not df.empty:
            last_row = df.iloc[-1].copy()

            for year in [2019, 2020]:
                new_row = last_row.copy()
                new_row[df.columns[0]] = year  # Assumes first column is 'year'
                df = pd.concat([df, pd.DataFrame([new_row])], ignore_index=True)

        # Create new filename
        new_filename = f"camels_ch_chem_landcover_{basin_code}.csv"
        output_path = os.path.join(output_folder, new_filename)

        # Save
        df.to_csv(output_path, index=False)


# End