# Vegetation attributes and time-series extraction

Author: Thiago Nascimento (thiago.nascimento@eawag.ch)

This notebook is part of the EStreams publication and was used to extract and aggregate the vegetation time-series from the MODIS dataset (i.e., LAI and NDVI).

* Note that this code enables not only the replicability of the current database but also the extrapolation to new catchment areas. 
* Additionally, the user should download and insert the original raw-data in the folder of the same name prior to run this code. 
* The original third-party data used were not made available in this repository due to redistribution and storage-space reasons.  

## Requirements
**Python:**
* Python>=3.6
* Jupyter
* geopandas=0.10.2
* glob
* numpy
* os
* pandas
* tqdm

Check the Github repository for an environment.yml (for conda environments) or requirements.txt (pip) file.

**Files:**
* data/shapefiles/estreams_catchments.shp
* data/gee/vegetation/LAI/EStreams_modis_LAI_mean_gee_{}.csv. LAI time-series CSV-file(s) exported from GEE.
* data/gee/vegetation/NDVI/EStreams_modis_NDVI_mean_gee_{}.csv. NDVI time-series CSV-file(s) exported from GEE.

**Directory:**
* Clone the GitHub directory locally
* Place any third-data variables in their respective directory.
* ONLY update the "PATH" variable in the section "Configurations", with their relative path to the EStreams directory. 

## References

* Didan, K. MODIS/Terra Vegetation Indices 16-Day L3 Global 500m SIN Grid V061 [Data set]. ASA EOSDIS Land Processes Distributed Active Archive Center https://doi.org/10.5067/MODIS/MOD13A1.061 (2021).
* Myneni, R., Knyazikhin, Y. & Park, T. MODIS/Terra Leaf Area Index/FPAR 8-Day L4 Global 500m SIN Grid V061 [Data set]. NASA EOSDIS Land Processes Distributed Active Archive Center https://doi.org/10.5067/MODIS/MOD15A2H.061 (2021).

## License

* LAI and NDVI: Open access: "MODIS data and products acquired through the LP DAAC have no restrictions on subsequent use, sale, or redistribution." https://lpdaac.usgs.gov/products/mod13a1v061/; https://lpdaac.usgs.gov/products/mod15a2hv061/;  (Last access 23 November 2023) 

## Observations
* This notebook assumes that the GEE code to export LAI and NDVI mean time-series from the MODIS dataset (EStreams_landscape_timeseries_LAI_gee.txt; EStreams_landscape_timeseries_NDVI_gee.txt) were run before in the GEE platform and that the output CSV-files are locally available. 
* It is not possible to export the 17,130 catchments at one single CSV, so there might be many files with the time-series stored separetly. 

# Import modules

In [1]:
import os
import numpy as np
import pandas as pd
import geopandas as gpd
import tqdm as tqdm
import glob

# Configurations

In [2]:
# Only editable variables:
# Relative path to your local directory
PATH = "../../.."

* #### The users should NOT change anything in the code below here.

In [3]:
# Non-editable variables:
PATH_OUTPUT_TS = "results/timeseries/vegetationindices"
PATH_OUTPUT_ST = "results/staticattributes"

# Set the directory:
os.chdir(PATH)

# Import data
## Catchment boundaries

In [4]:
catchment_boundaries = gpd.read_file('data/shapefiles/estreams_catchments.shp')
catchment_boundaries

Unnamed: 0,id,area_km2,outlet_lat,outlet_lng,name,area_offic,layer,path,Code,basin_id,area_calc,geometry
0,FR003159,37,47.488,7.393,A100003001,38.6,FR003159,C:/Users/nascimth/Documents/Thiago/Eawag/Pytho...,FR003159,FR003159,37.183,"POLYGON ((7.30374 47.49375, 7.30708 47.49375, ..."
1,FR003160,227,47.626,7.239,A105003001,233.0,FR003160,C:/Users/nascimth/Documents/Thiago/Eawag/Pytho...,FR003160,FR003160,226.962,"POLYGON ((7.22291 47.63458, 7.22374 47.63458, ..."
2,FR003161,14,47.586,7.384,A106000101,15.0,FR003161,C:/Users/nascimth/Documents/Thiago/Eawag/Pytho...,FR003161,FR003161,13.595,"POLYGON ((7.38791 47.59041, 7.39874 47.59041, ..."
3,FR003162,70,47.622,7.275,A107020001,70.0,FR003162,C:/Users/nascimth/Documents/Thiago/Eawag/Pytho...,FR003162,FR003162,70.152,"POLYGON ((7.28375 47.60958, 7.28291 47.60958, ..."
4,FR003163,330,47.653,7.265,A108003001,325.0,FR003163,C:/Users/nascimth/Documents/Thiago/Eawag/Pytho...,FR003163,FR003163,330.158,"POLYGON ((7.22958 47.65291, 7.23208 47.65291, ..."
...,...,...,...,...,...,...,...,...,...,...,...,...
1967,HR000314,135,44.202,16.069,7267,,HR000314,C:/Users/nascimth/Documents/Thiago/Eawag/Pytho...,HR000314,HR000314,135.462,"POLYGON ((16.01458 44.21375, 16.01375 44.21375..."
1968,HR000315,458,44.162,15.858,7236,,HR000315,C:/Users/nascimth/Documents/Thiago/Eawag/Pytho...,HR000315,HR000315,457.864,"POLYGON ((15.89625 44.07791, 15.89374 44.07791..."
1969,HR000316,514,44.162,15.849,7237,,HR000316,C:/Users/nascimth/Documents/Thiago/Eawag/Pytho...,HR000316,HR000316,514.369,"POLYGON ((15.84208 44.15458, 15.84208 44.15458..."
1970,HR000317,185,45.334,14.452,6077,,HR000317,C:/Users/nascimth/Documents/Thiago/Eawag/Pytho...,HR000317,HR000317,184.733,"POLYGON ((14.51875 45.36708, 14.51875 45.36791..."


In [5]:
print("The total number of catchments to be processed are:", len(catchment_boundaries))

The total number of catchments to be processed are: 1972


## GEE outputs
### Leaf Area index (LAI)

In [6]:
# Check the files in the subdirectory:
filenames = glob.glob("data/gee/vegetation/LAI/*.csv")
print("Number of files:", len(filenames))
print("First file:", filenames[0])

Number of files: 82
First file: data/gee/vegetation/LAI/EStreams_modis_LAI_mean_gee_650_674.csv


In [7]:
# First, we create an empty DataFrame for the data with a datetime index:
LAI_df = pd.DataFrame(index=pd.date_range(start='2001-01-01', end='2022-12-31', freq='M'))

# Loop for reading and concatenating the data:
for file in tqdm.tqdm(filenames):
    
    # Read the data from the CSV file:
    LAI_file = pd.read_csv(file)
    LAI_file.drop(["system:index", ".geo"], axis=1, inplace=True)
    LAI_file = LAI_file.T
    
    # Set columns based on the "basin_id" row and drop it
    LAI_file.columns = LAI_file.loc["basin_id", :].tolist()
    LAI_file.drop(["basin_id"], axis=0, inplace=True)
    
    # Convert the index to integers and sort it
    LAI_file.index = LAI_file.index.astype(int)
    LAI_file.sort_index(inplace=True)
    
    # Create a new DataFrame with datetime index and assign values
    LAI_file_df = pd.DataFrame(columns=LAI_file.columns)
    LAI_file_df["dates"] = pd.date_range(start='2001-01-01', end='2022-12-31', freq='M')
    LAI_file_df.loc[:, LAI_file.columns] = LAI_file
    LAI_file_df.set_index("dates", inplace=True)
    LAI_file_df.index.name = ""
    
    # Concatenate the DataFrames along the columns (axis=1)
    LAI_df = pd.concat([LAI_df, LAI_file_df], axis=1)
    
# Apply the scale factor from Google Earth Engine (GEE)
LAI_df = LAI_df * 0.01
LAI_df

100%|██████████████████████████████████████████| 82/82 [00:00<00:00, 112.24it/s]


Unnamed: 0,FR004652,FR004884,FR004843,FR003537,FR004666,FR003276,FR004577,FR003436,FR003586,FR004241,...,FR003207,FR004721,FR004801,FR003610,FR004505,FR004698,FR004287,FR004616,FR003871,FR003206
2001-01-31,0.075199,0.052593,0.135622,0.075388,0.035576,0.104261,0.055072,0.069595,0.093917,0.06728,...,0.0697,0.051573,0.047142,0.083182,0.089889,0.071609,0.026001,0.069393,0.10833,0.072623
2001-02-28,0.070456,0.065856,0.184078,0.044445,0.050115,0.065686,0.084407,0.056433,0.041492,0.088467,...,0.056965,0.061079,0.046847,0.070878,0.07009,0.089667,0.059736,0.056934,0.076809,0.062115
2001-03-31,0.123463,0.091204,0.224975,0.118571,0.068151,0.097017,0.098307,0.103976,0.059331,0.139453,...,0.071223,0.085329,0.068666,0.086444,0.109463,0.124542,0.117746,0.102833,0.213341,0.080061
2001-04-30,0.157489,0.138229,0.236672,0.11451,0.07031,0.149869,0.163474,0.137767,0.099631,0.199594,...,0.097267,0.104671,0.096007,0.115194,0.168576,0.153171,0.176232,0.110236,0.238104,0.09338
2001-05-31,0.188619,0.288071,0.270928,0.259714,0.235712,0.277082,0.393406,0.265215,0.149302,0.19994,...,0.357072,0.286648,0.16031,0.213077,0.363995,0.283257,0.259681,0.220965,0.303327,0.324424
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2022-08-31,0.138412,0.312646,0.300864,0.185537,0.261817,0.124963,0.264344,0.128135,0.077496,0.167531,...,0.366863,0.374604,0.178587,0.102739,0.201435,0.284169,0.135871,0.173603,0.175677,0.367178
2022-09-30,0.15568,0.268288,0.241786,0.183469,0.291092,0.120382,0.328473,0.1477,0.073194,0.118794,...,0.325212,0.318943,0.174422,0.086977,0.173592,0.288659,0.158126,0.174375,0.262878,0.315328
2022-10-31,0.142313,0.098802,0.203781,0.165208,0.152773,0.108421,0.281399,0.136164,0.093368,0.078702,...,0.251219,0.125898,0.075184,0.081031,0.150192,0.184202,0.184483,0.167606,0.181053,0.2583
2022-11-30,0.112358,0.115674,0.211222,0.084252,0.10687,0.119239,0.139696,0.098025,0.106533,0.079451,...,0.139442,0.099877,0.081031,0.086029,0.06558,0.125352,0.153353,0.115998,0.149086,0.131993


In [8]:
# Here we add the columns of the catchemnts that were not processed
# Adding new columns with NaN values only if they don't exist
for col in catchment_boundaries.basin_id.tolist():
    if col not in LAI_df.columns:
        LAI_df[col] = np.nan
LAI_df

Unnamed: 0,FR004652,FR004884,FR004843,FR003537,FR004666,FR003276,FR004577,FR003436,FR003586,FR004241,...,FR003207,FR004721,FR004801,FR003610,FR004505,FR004698,FR004287,FR004616,FR003871,FR003206
2001-01-31,0.075199,0.052593,0.135622,0.075388,0.035576,0.104261,0.055072,0.069595,0.093917,0.06728,...,0.0697,0.051573,0.047142,0.083182,0.089889,0.071609,0.026001,0.069393,0.10833,0.072623
2001-02-28,0.070456,0.065856,0.184078,0.044445,0.050115,0.065686,0.084407,0.056433,0.041492,0.088467,...,0.056965,0.061079,0.046847,0.070878,0.07009,0.089667,0.059736,0.056934,0.076809,0.062115
2001-03-31,0.123463,0.091204,0.224975,0.118571,0.068151,0.097017,0.098307,0.103976,0.059331,0.139453,...,0.071223,0.085329,0.068666,0.086444,0.109463,0.124542,0.117746,0.102833,0.213341,0.080061
2001-04-30,0.157489,0.138229,0.236672,0.11451,0.07031,0.149869,0.163474,0.137767,0.099631,0.199594,...,0.097267,0.104671,0.096007,0.115194,0.168576,0.153171,0.176232,0.110236,0.238104,0.09338
2001-05-31,0.188619,0.288071,0.270928,0.259714,0.235712,0.277082,0.393406,0.265215,0.149302,0.19994,...,0.357072,0.286648,0.16031,0.213077,0.363995,0.283257,0.259681,0.220965,0.303327,0.324424
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2022-08-31,0.138412,0.312646,0.300864,0.185537,0.261817,0.124963,0.264344,0.128135,0.077496,0.167531,...,0.366863,0.374604,0.178587,0.102739,0.201435,0.284169,0.135871,0.173603,0.175677,0.367178
2022-09-30,0.15568,0.268288,0.241786,0.183469,0.291092,0.120382,0.328473,0.1477,0.073194,0.118794,...,0.325212,0.318943,0.174422,0.086977,0.173592,0.288659,0.158126,0.174375,0.262878,0.315328
2022-10-31,0.142313,0.098802,0.203781,0.165208,0.152773,0.108421,0.281399,0.136164,0.093368,0.078702,...,0.251219,0.125898,0.075184,0.081031,0.150192,0.184202,0.184483,0.167606,0.181053,0.2583
2022-11-30,0.112358,0.115674,0.211222,0.084252,0.10687,0.119239,0.139696,0.098025,0.106533,0.079451,...,0.139442,0.099877,0.081031,0.086029,0.06558,0.125352,0.153353,0.115998,0.149086,0.131993


In [9]:
# Here we sort the columns:
LAI_df = LAI_df.sort_index(axis=1)
LAI_df

Unnamed: 0,FR003159,FR003160,FR003161,FR003162,FR003163,FR003164,FR003165,FR003166,FR003167,FR003168,...,HR000308,HR000309,HR000310,HR000311,HR000312,HR000313,HR000314,HR000315,HR000316,HR000317
2001-01-31,0.049238,0.041123,0.031329,0.025796,0.036856,0.053267,0.047154,0.043972,0.043909,0.039302,...,0.021274,0.036795,0.027624,0.025175,0.026434,0.027305,0.020895,0.026412,0.026338,0.034127
2001-02-28,0.064191,0.057416,0.04747,0.037929,0.051895,0.065298,0.061622,0.060398,0.057906,0.05347,...,0.032084,0.049283,0.027878,0.027739,0.027672,0.0278,0.023375,0.027941,0.028272,0.037808
2001-03-31,0.080194,0.067592,0.051833,0.045572,0.060757,0.067226,0.065661,0.063851,0.062273,0.06021,...,0.090745,0.104106,0.057466,0.054191,0.055573,0.057617,0.055774,0.055903,0.056178,0.045737
2001-04-30,0.081159,0.081207,0.075226,0.061873,0.076852,0.106041,0.087677,0.089134,0.097948,0.085026,...,0.146708,0.240244,0.095874,0.107466,0.102667,0.096133,0.084555,0.106218,0.110792,0.086649
2001-05-31,0.340205,0.259408,0.159404,0.141572,0.228656,0.315201,0.259433,0.248306,0.226216,0.222921,...,0.194209,0.387841,0.213918,0.202859,0.202443,0.217157,0.219702,0.205023,0.209734,0.358876
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2022-08-31,0.345807,0.278931,0.201344,0.182686,0.251441,0.367809,0.277049,0.256004,0.219775,0.233549,...,0.107823,0.281038,0.164397,0.128413,0.139426,0.167102,0.187651,0.137682,0.135852,0.288847
2022-09-30,0.324562,0.249841,0.181418,0.147928,0.222351,0.328964,0.247043,0.232817,0.207665,0.212262,...,0.091788,0.230261,0.150072,0.119008,0.128422,0.151995,0.165073,0.127256,0.126064,0.231349
2022-10-31,0.188505,0.15065,0.120404,0.097061,0.135776,0.195229,0.16631,0.157343,0.145455,0.138272,...,0.057179,0.062765,0.064678,0.056417,0.057835,0.065107,0.066776,0.057486,0.057682,0.140251
2022-11-30,0.102131,0.068236,0.060337,0.04469,0.060966,0.10608,0.085048,0.078336,0.069481,0.063727,...,0.053581,0.063689,0.073927,0.062236,0.067588,0.07346,0.068436,0.066898,0.065712,0.068498


In [10]:
# Resample to yearly mean
LAI_yr = LAI_df.resample('Y').mean()
LAI_yr

Unnamed: 0,FR003159,FR003160,FR003161,FR003162,FR003163,FR003164,FR003165,FR003166,FR003167,FR003168,...,HR000308,HR000309,HR000310,HR000311,HR000312,HR000313,HR000314,HR000315,HR000316,HR000317
2001-12-31,0.201293,0.162989,0.115245,0.111523,0.14887,0.192814,0.162374,0.156236,0.149372,0.147653,...,0.109675,0.204399,0.106807,0.095005,0.097371,0.108216,0.112149,0.097835,0.098446,0.177364
2002-12-31,0.196759,0.16095,0.115767,0.114173,0.148459,0.185185,0.158405,0.153117,0.150389,0.148232,...,0.105231,0.188618,0.11433,0.098926,0.102385,0.115849,0.123365,0.102495,0.102635,0.189599
2003-12-31,0.197381,0.157005,0.109401,0.106194,0.143408,0.180324,0.152886,0.146188,0.13755,0.139649,...,0.089063,0.19268,0.100288,0.089018,0.091867,0.101458,0.106729,0.092014,0.092185,0.185616
2004-12-31,0.196671,0.162564,0.12461,0.116261,0.149848,0.180502,0.162455,0.155865,0.146204,0.146817,...,0.107256,0.193769,0.113146,0.096928,0.102825,0.114343,0.117274,0.102919,0.102061,0.173299
2005-12-31,0.18666,0.159663,0.117749,0.111969,0.146816,0.187852,0.159499,0.15429,0.147354,0.145908,...,0.097635,0.184211,0.110889,0.096773,0.101885,0.111788,0.11426,0.102095,0.101723,0.186982
2006-12-31,0.204096,0.162653,0.125101,0.117255,0.149935,0.190135,0.16116,0.154414,0.149045,0.148156,...,0.100978,0.189807,0.111284,0.097985,0.102982,0.112128,0.116844,0.103026,0.102501,0.178336
2007-12-31,0.204782,0.167309,0.121093,0.117408,0.153804,0.193947,0.164857,0.158756,0.155869,0.153547,...,0.101309,0.198657,0.120865,0.10486,0.109902,0.122111,0.12928,0.109807,0.109479,0.190641
2008-12-31,0.189289,0.157504,0.118138,0.116693,0.146459,0.175028,0.15217,0.145518,0.14118,0.142871,...,0.109422,0.190894,0.114653,0.09933,0.103588,0.116045,0.123776,0.103688,0.103617,0.176487
2009-12-31,0.200402,0.166552,0.122924,0.118002,0.152982,0.191388,0.162661,0.155358,0.146556,0.148542,...,0.099724,0.194371,0.119048,0.105885,0.110035,0.120058,0.125175,0.110106,0.11034,0.203448
2010-12-31,0.173477,0.146983,0.108385,0.110227,0.137057,0.170625,0.146693,0.141203,0.13962,0.137449,...,0.103393,0.185218,0.118255,0.107795,0.111351,0.119009,0.122603,0.111933,0.112774,0.174796


In [11]:
# Calculate the mean for each month across all years (monht of the year)
LAI_moy = LAI_df.groupby(LAI_df.index.month).mean()

# Rename the index to the three-letter month abbreviations
LAI_moy.index = pd.to_datetime(LAI_moy.index, format='%m').strftime('%b')

LAI_moy

Unnamed: 0,FR003159,FR003160,FR003161,FR003162,FR003163,FR003164,FR003165,FR003166,FR003167,FR003168,...,HR000308,HR000309,HR000310,HR000311,HR000312,HR000313,HR000314,HR000315,HR000316,HR000317
Jan,0.072971,0.058257,0.043986,0.040098,0.052833,0.066414,0.059059,0.057284,0.055965,0.05359,...,0.02491,0.03064,0.027491,0.026959,0.027443,0.027105,0.021976,0.027547,0.027627,0.037945
Feb,0.055985,0.047785,0.036196,0.030246,0.042832,0.055928,0.052538,0.050969,0.049804,0.045327,...,0.029826,0.037476,0.02778,0.028267,0.028118,0.027564,0.023159,0.028347,0.028787,0.039384
Mar,0.088706,0.07647,0.06625,0.054893,0.070209,0.094399,0.086612,0.083868,0.079581,0.073365,...,0.069967,0.073839,0.045721,0.044834,0.045287,0.045604,0.040452,0.045641,0.046078,0.055723
Apr,0.168016,0.143477,0.115635,0.096385,0.130769,0.17234,0.154859,0.149505,0.145856,0.135921,...,0.138633,0.230417,0.08814,0.099964,0.09594,0.087686,0.075204,0.099001,0.103039,0.110732
May,0.319755,0.24092,0.158911,0.137902,0.214044,0.298835,0.249151,0.236392,0.220714,0.214972,...,0.177988,0.346516,0.210801,0.196702,0.200945,0.212271,0.2137,0.202678,0.205757,0.299919
Jun,0.361956,0.282768,0.195054,0.191191,0.259541,0.333766,0.277575,0.263979,0.247888,0.253247,...,0.201858,0.375265,0.26373,0.217206,0.231712,0.267582,0.296539,0.231035,0.229855,0.410019
Jul,0.367758,0.319068,0.237305,0.260661,0.303148,0.344581,0.295644,0.284828,0.276802,0.290133,...,0.212651,0.384646,0.228858,0.187702,0.199863,0.232351,0.258889,0.199113,0.197851,0.404514
Aug,0.349299,0.299939,0.214715,0.230837,0.281495,0.332461,0.283661,0.270872,0.261363,0.271115,...,0.176042,0.356106,0.192498,0.158322,0.168384,0.195295,0.216862,0.167758,0.166685,0.367831
Sep,0.293194,0.244279,0.170166,0.165034,0.222871,0.286738,0.241306,0.227854,0.211392,0.215488,...,0.099129,0.255181,0.159225,0.13325,0.141023,0.16114,0.173829,0.140593,0.139821,0.285027
Oct,0.135214,0.117239,0.091722,0.076892,0.105678,0.148403,0.125542,0.119132,0.110079,0.106208,...,0.058311,0.149117,0.097341,0.085877,0.089795,0.097688,0.096423,0.089682,0.089479,0.132768


### Normalized Vegetation Difference Index (NDVI)

In [12]:
# Check the files in the subdirectory:
filenames = glob.glob("data/gee/vegetation/NDVI/*.csv")
print("Number of files:", len(filenames))
print("First file:", filenames[0])

Number of files: 47
First file: data/gee/vegetation/NDVI/EStreams_modis_NDVI_mean_gee_550_599.csv


In [13]:
# First, we create an empty DataFrame for the data with a datetime index:
ndvi_df = pd.DataFrame(index=pd.date_range(start='2001-01-01', end='2022-12-31', freq='M'))

# Loop for reading and concatenating the data:
for file in tqdm.tqdm(filenames):
    
    # Read the data from the CSV file:
    ndvi_file = pd.read_csv(file)
    ndvi_file.drop(["system:index", ".geo"], axis=1, inplace=True)
    ndvi_file = ndvi_file.T
    
    # Set columns based on the "basin_id" row and drop it
    ndvi_file.columns = ndvi_file.loc["basin_id", :].tolist()
    ndvi_file.drop(["basin_id"], axis=0, inplace=True)
    
    # Convert the index to integers and sort it
    ndvi_file.index = ndvi_file.index.astype(int)
    ndvi_file.sort_index(inplace=True)
    
    # Create a new DataFrame with datetime index and assign values
    ndvi_file_df = pd.DataFrame(columns=ndvi_file.columns)
    ndvi_file_df["dates"] = pd.date_range(start='2001-01-01', end='2022-12-31', freq='M')
    ndvi_file_df.loc[:, ndvi_file.columns] = ndvi_file
    ndvi_file_df.set_index("dates", inplace=True)
    ndvi_file_df.index.name = ""
    
    # Concatenate the DataFrames along the columns (axis=1)
    ndvi_df = pd.concat([ndvi_df, ndvi_file_df], axis=1)
    
# Apply the scale factor from Google Earth Engine (GEE)
ndvi_df = ndvi_df * 0.0001
ndvi_df

100%|███████████████████████████████████████████| 47/47 [00:00<00:00, 93.69it/s]


Unnamed: 0,FR003394,FR003999,FR003649,FR004253,FR004148,FR003813,FR004959,FR003893,FR003166,FR003873,...,FR004022,FR003814,FR003622,FR004172,FR004712,FR004694,FR003333,HR000267,FR004173,FR003830
2001-01-31,0.525965,0.666167,0.359644,0.556515,0.520984,0.583529,0.366066,0.686512,0.542526,0.669658,...,0.437523,0.677376,0.583012,0.531781,0.517777,0.661832,0.577888,0.321599,0.53164,0.461185
2001-02-28,0.440112,0.650053,0.431622,0.557807,0.588498,0.644574,0.398821,0.613139,0.54248,0.606583,...,0.425445,0.6426,0.582964,0.548105,0.535753,0.621204,0.611123,0.407201,0.549011,0.617699
2001-03-31,0.482036,0.595031,0.484924,0.622901,0.649398,0.591721,0.432985,0.648663,0.531911,0.612928,...,0.615901,0.676247,0.620371,0.538544,0.563554,0.675295,0.629306,0.465723,0.539377,0.704901
2001-04-30,0.573823,0.669326,0.580917,0.643642,0.729358,0.64271,0.417338,0.562531,0.535902,0.689854,...,0.647591,0.720454,0.53182,0.595581,0.648749,0.712927,0.624175,0.579584,0.596754,0.713943
2001-05-31,0.696539,0.749432,0.674709,0.654108,0.802015,0.829478,0.564454,0.821908,0.711661,0.794325,...,0.750871,0.829661,0.786257,0.762899,0.768714,0.769702,0.802079,0.817623,0.763149,0.817091
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2022-08-31,0.550525,0.593078,0.4726,0.654891,0.686663,0.806838,0.533087,0.685472,0.736143,0.559447,...,0.347057,0.746328,0.516136,0.753508,0.677527,0.746799,0.583954,0.711139,0.752859,0.757941
2022-09-30,0.612582,0.668386,0.558481,0.593668,0.696616,0.816604,0.528072,0.762943,0.757415,0.65228,...,0.424499,0.783867,0.638407,0.755255,0.752709,0.7908,0.731286,0.732048,0.755309,0.801069
2022-10-31,0.646198,0.391158,0.468264,0.517348,0.612277,0.732234,0.498072,0.738405,0.694148,0.71168,...,0.472496,0.74979,0.66348,0.480661,0.68197,0.727388,0.652279,0.600311,0.485341,0.772116
2022-11-30,0.691078,0.745526,0.576353,0.616394,0.671213,0.749705,0.548024,0.736176,0.652717,0.670981,...,0.548727,0.781002,0.654294,0.676552,0.660999,0.713666,0.536195,0.577666,0.677518,0.735299


In [14]:
# Here we add the columns of the catchemnts that were not processed
# Adding new columns with NaN values only if they don't exist
for col in catchment_boundaries.basin_id.tolist():
    if col not in ndvi_df.columns:
        ndvi_df[col] = np.nan
ndvi_df

Unnamed: 0,FR003394,FR003999,FR003649,FR004253,FR004148,FR003813,FR004959,FR003893,FR003166,FR003873,...,FR004022,FR003814,FR003622,FR004172,FR004712,FR004694,FR003333,HR000267,FR004173,FR003830
2001-01-31,0.525965,0.666167,0.359644,0.556515,0.520984,0.583529,0.366066,0.686512,0.542526,0.669658,...,0.437523,0.677376,0.583012,0.531781,0.517777,0.661832,0.577888,0.321599,0.53164,0.461185
2001-02-28,0.440112,0.650053,0.431622,0.557807,0.588498,0.644574,0.398821,0.613139,0.54248,0.606583,...,0.425445,0.6426,0.582964,0.548105,0.535753,0.621204,0.611123,0.407201,0.549011,0.617699
2001-03-31,0.482036,0.595031,0.484924,0.622901,0.649398,0.591721,0.432985,0.648663,0.531911,0.612928,...,0.615901,0.676247,0.620371,0.538544,0.563554,0.675295,0.629306,0.465723,0.539377,0.704901
2001-04-30,0.573823,0.669326,0.580917,0.643642,0.729358,0.64271,0.417338,0.562531,0.535902,0.689854,...,0.647591,0.720454,0.53182,0.595581,0.648749,0.712927,0.624175,0.579584,0.596754,0.713943
2001-05-31,0.696539,0.749432,0.674709,0.654108,0.802015,0.829478,0.564454,0.821908,0.711661,0.794325,...,0.750871,0.829661,0.786257,0.762899,0.768714,0.769702,0.802079,0.817623,0.763149,0.817091
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2022-08-31,0.550525,0.593078,0.4726,0.654891,0.686663,0.806838,0.533087,0.685472,0.736143,0.559447,...,0.347057,0.746328,0.516136,0.753508,0.677527,0.746799,0.583954,0.711139,0.752859,0.757941
2022-09-30,0.612582,0.668386,0.558481,0.593668,0.696616,0.816604,0.528072,0.762943,0.757415,0.65228,...,0.424499,0.783867,0.638407,0.755255,0.752709,0.7908,0.731286,0.732048,0.755309,0.801069
2022-10-31,0.646198,0.391158,0.468264,0.517348,0.612277,0.732234,0.498072,0.738405,0.694148,0.71168,...,0.472496,0.74979,0.66348,0.480661,0.68197,0.727388,0.652279,0.600311,0.485341,0.772116
2022-11-30,0.691078,0.745526,0.576353,0.616394,0.671213,0.749705,0.548024,0.736176,0.652717,0.670981,...,0.548727,0.781002,0.654294,0.676552,0.660999,0.713666,0.536195,0.577666,0.677518,0.735299


In [15]:
# Here we sort the columns:
ndvi_df = ndvi_df.sort_index(axis=1)
ndvi_df

Unnamed: 0,FR003159,FR003160,FR003161,FR003162,FR003163,FR003164,FR003165,FR003166,FR003167,FR003168,...,HR000308,HR000309,HR000310,HR000311,HR000312,HR000313,HR000314,HR000315,HR000316,HR000317
2001-01-31,0.601317,0.532215,0.463625,0.428774,0.502899,0.600051,0.551465,0.542526,0.524682,0.5077,...,0.394033,0.492312,0.36315,0.382596,0.381067,0.362212,0.314004,0.384296,0.389257,0.370679
2001-02-28,0.593734,0.530041,0.48064,0.433147,0.502431,0.576511,0.548612,0.54248,0.518513,0.503943,...,0.381296,0.399605,0.365297,0.368573,0.36934,0.366179,0.356516,0.371127,0.373849,0.401991
2001-03-31,0.567955,0.5194,0.435721,0.418362,0.490695,0.580931,0.542322,0.531911,0.486613,0.484124,...,0.510467,0.538618,0.412413,0.426515,0.422223,0.413821,0.398967,0.426844,0.432138,0.378189
2001-04-30,0.522015,0.512898,0.45979,0.471544,0.506502,0.540475,0.522168,0.535902,0.567187,0.531297,...,0.591058,0.719619,0.520146,0.57365,0.558523,0.521522,0.486095,0.566965,0.578605,0.521977
2001-05-31,0.790268,0.704108,0.616357,0.576197,0.671979,0.770777,0.720856,0.711661,0.681201,0.672928,...,0.649462,0.792958,0.708105,0.707438,0.708262,0.714082,0.723823,0.710855,0.714217,0.818318
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2022-08-31,0.791697,0.748171,0.726812,0.676717,0.725596,0.799572,0.750668,0.736143,0.698688,0.709189,...,0.482464,0.70195,0.634065,0.585593,0.604701,0.637992,0.651508,0.602995,0.60227,0.725091
2022-09-30,0.798949,0.752788,0.72717,0.672061,0.729998,0.802095,0.762586,0.757415,0.729834,0.726014,...,0.531106,0.710021,0.641524,0.603676,0.616909,0.644976,0.647523,0.617006,0.618893,0.716132
2022-10-31,0.723801,0.665788,0.634535,0.578783,0.641877,0.738708,0.695114,0.694148,0.677657,0.655689,...,0.466943,0.570464,0.545085,0.537309,0.54008,0.545577,0.540573,0.541633,0.545055,0.574736
2022-11-30,0.71011,0.628246,0.617671,0.581984,0.613266,0.716575,0.660743,0.652717,0.634769,0.619527,...,0.532686,0.55178,0.562242,0.544736,0.560448,0.561189,0.530847,0.55972,0.560598,0.515546


In [16]:
# Resample to yearly mean
ndvi_yr = ndvi_df.resample('Y').mean()
ndvi_yr

Unnamed: 0,FR003159,FR003160,FR003161,FR003162,FR003163,FR003164,FR003165,FR003166,FR003167,FR003168,...,HR000308,HR000309,HR000310,HR000311,HR000312,HR000313,HR000314,HR000315,HR000316,HR000317
2001-12-31,0.672266,0.626284,0.56267,0.547994,0.605569,0.665166,0.635024,0.631199,0.624507,0.611915,...,0.495212,0.618897,0.512034,0.515867,0.514916,0.514605,0.499483,0.518084,0.522671,0.591966
2002-12-31,0.682132,0.620603,0.548686,0.536339,0.598328,0.659079,0.623209,0.622027,0.618022,0.605327,...,0.497478,0.613221,0.523583,0.527556,0.527306,0.52545,0.5103,0.530252,0.534742,0.598657
2003-12-31,0.661902,0.594355,0.522594,0.502759,0.570506,0.638242,0.605609,0.601001,0.580468,0.57299,...,0.444189,0.580317,0.469645,0.48271,0.480071,0.470342,0.451887,0.483179,0.48851,0.571197
2004-12-31,0.673064,0.610988,0.584199,0.547511,0.591239,0.652213,0.616627,0.608686,0.590424,0.587627,...,0.513869,0.60882,0.488165,0.49852,0.499268,0.48842,0.463434,0.502494,0.506252,0.562078
2005-12-31,0.656927,0.61418,0.563853,0.547086,0.595681,0.639306,0.627397,0.625153,0.608889,0.598297,...,0.474555,0.579635,0.496113,0.509027,0.510437,0.495834,0.465927,0.513521,0.51834,0.586502
2006-12-31,0.658645,0.614185,0.566108,0.547864,0.595412,0.644093,0.620332,0.617262,0.610329,0.599721,...,0.498748,0.607739,0.517402,0.520023,0.523632,0.518297,0.502398,0.525714,0.528953,0.572475
2007-12-31,0.687558,0.631756,0.564173,0.539939,0.60649,0.666847,0.634709,0.632654,0.620235,0.610028,...,0.507674,0.612166,0.533346,0.533602,0.536016,0.534527,0.524729,0.537963,0.541384,0.600957
2008-12-31,0.676524,0.618679,0.566062,0.539723,0.596118,0.655048,0.623735,0.6168,0.600334,0.595104,...,0.51724,0.585226,0.516267,0.515118,0.517384,0.517763,0.50858,0.519612,0.523013,0.575611
2009-12-31,0.66213,0.609569,0.5495,0.529412,0.587388,0.640503,0.615885,0.609092,0.592337,0.58678,...,0.474865,0.582805,0.513483,0.526203,0.525112,0.51328,0.492025,0.527928,0.533405,0.602578
2010-12-31,0.60383,0.56807,0.511094,0.495309,0.548242,0.599244,0.572256,0.568251,0.558376,0.549857,...,0.49307,0.583926,0.513036,0.53218,0.531729,0.512429,0.486511,0.535226,0.540665,0.559985


In [17]:
# Calculate the mean for each month across all years (monht of the year)
ndvi_moy = ndvi_df.groupby(ndvi_df.index.month).mean()

# Rename the index to the three-letter month abbreviations
ndvi_moy.index = pd.to_datetime(ndvi_moy.index, format='%m').strftime('%b')

ndvi_moy

Unnamed: 0,FR003159,FR003160,FR003161,FR003162,FR003163,FR003164,FR003165,FR003166,FR003167,FR003168,...,HR000308,HR000309,HR000310,HR000311,HR000312,HR000313,HR000314,HR000315,HR000316,HR000317
Jan,0.519879,0.468583,0.395938,0.371609,0.442145,0.507535,0.47452,0.468599,0.458053,0.446545,...,0.351439,0.39866,0.305551,0.3475,0.337514,0.302235,0.24763,0.342179,0.350151,0.363104
Feb,0.493928,0.454126,0.395101,0.364237,0.429694,0.478249,0.465223,0.459381,0.452146,0.436481,...,0.350844,0.389869,0.287484,0.329919,0.318462,0.284448,0.228859,0.323145,0.331143,0.335462
Mar,0.571515,0.509441,0.450488,0.411578,0.481977,0.580735,0.542756,0.536108,0.514389,0.492347,...,0.449387,0.470329,0.343943,0.36962,0.363558,0.342598,0.308063,0.367421,0.373428,0.371531
Apr,0.673271,0.616778,0.567991,0.518114,0.590848,0.677564,0.645964,0.641387,0.62691,0.603698,...,0.578292,0.683028,0.511917,0.563365,0.550865,0.510123,0.472246,0.558163,0.568378,0.541003
May,0.792825,0.708918,0.629935,0.593113,0.679673,0.771697,0.731452,0.723386,0.695997,0.684731,...,0.629685,0.771465,0.70939,0.709379,0.714105,0.711384,0.713062,0.716084,0.719765,0.779129
Jun,0.805946,0.756471,0.701732,0.709446,0.745029,0.782871,0.751728,0.746491,0.737858,0.741534,...,0.661077,0.792433,0.724952,0.698354,0.710967,0.728723,0.749994,0.711304,0.711915,0.820685
Jul,0.808828,0.784622,0.728848,0.76069,0.777289,0.788827,0.764031,0.760407,0.75739,0.767518,...,0.66493,0.788404,0.671177,0.646593,0.656841,0.67482,0.694664,0.657347,0.658882,0.797281
Aug,0.808303,0.774158,0.717414,0.735916,0.762883,0.796126,0.765201,0.758354,0.747172,0.754236,...,0.594344,0.751736,0.644701,0.621851,0.631273,0.648022,0.660605,0.631774,0.633591,0.77905
Sep,0.77862,0.734004,0.677835,0.660243,0.712551,0.769218,0.738016,0.728872,0.708509,0.707795,...,0.498004,0.685816,0.646491,0.625621,0.635524,0.649419,0.653992,0.636228,0.638342,0.738609
Oct,0.701243,0.641421,0.587643,0.551714,0.615831,0.706871,0.66372,0.657382,0.634318,0.620718,...,0.462965,0.621839,0.589984,0.581944,0.589473,0.591117,0.576513,0.591052,0.594586,0.633287


# Final aggregation (static attributes)

In [18]:
# LAI:
LAI_moy_T = LAI_moy.T
LAI_moy_T.columns = pd.to_datetime(LAI_moy_T.columns, format='%b').strftime('%m')
LAI_moy_T.columns = "lai_" + LAI_moy_T.columns

LAI_moy_T["lai_mean"] = LAI_moy_T.mean(axis = 1)

LAI_moy_T

Unnamed: 0,lai_01,lai_02,lai_03,lai_04,lai_05,lai_06,lai_07,lai_08,lai_09,lai_10,lai_11,lai_12,lai_mean
FR003159,0.072971,0.055985,0.088706,0.168016,0.319755,0.361956,0.367758,0.349299,0.293194,0.135214,0.073927,0.163193,0.204164
FR003160,0.058257,0.047785,0.07647,0.143477,0.24092,0.282768,0.319068,0.299939,0.244279,0.117239,0.06008,0.116381,0.167222
FR003161,0.043986,0.036196,0.06625,0.115635,0.158911,0.195054,0.237305,0.214715,0.170166,0.091722,0.05448,0.10199,0.123867
FR003162,0.040098,0.030246,0.054893,0.096385,0.137902,0.191191,0.260661,0.230837,0.165034,0.076892,0.044328,0.089336,0.11815
FR003163,0.052833,0.042832,0.070209,0.130769,0.214044,0.259541,0.303148,0.281495,0.222871,0.105678,0.055171,0.108509,0.153925
...,...,...,...,...,...,...,...,...,...,...,...,...,...
HR000313,0.027105,0.027564,0.045604,0.087686,0.212271,0.267582,0.232351,0.195295,0.16114,0.097688,0.053597,0.033425,0.120109
HR000314,0.021976,0.023159,0.040452,0.075204,0.2137,0.296539,0.258889,0.216862,0.173829,0.096423,0.049973,0.03035,0.12478
HR000315,0.027547,0.028347,0.045641,0.099001,0.202678,0.231035,0.199113,0.167758,0.140593,0.089682,0.050214,0.032145,0.10948
HR000316,0.027627,0.028787,0.046078,0.103039,0.205757,0.229855,0.197851,0.166685,0.139821,0.089479,0.049713,0.031882,0.109715


In [19]:
# NDVI:
ndvi_moy_T = ndvi_moy.T
ndvi_moy_T.columns = pd.to_datetime(ndvi_moy_T.columns, format='%b').strftime('%m')
ndvi_moy_T.columns = "ndvi_" + ndvi_moy_T.columns

ndvi_moy_T["ndvi_mean"] = ndvi_moy_T.mean(axis = 1)

ndvi_moy_T

Unnamed: 0,ndvi_01,ndvi_02,ndvi_03,ndvi_04,ndvi_05,ndvi_06,ndvi_07,ndvi_08,ndvi_09,ndvi_10,ndvi_11,ndvi_12,ndvi_mean
FR003159,0.519879,0.493928,0.571515,0.673271,0.792825,0.805946,0.808828,0.808303,0.77862,0.701243,0.632045,0.556229,0.678553
FR003160,0.468583,0.454126,0.509441,0.616778,0.708918,0.756471,0.784622,0.774158,0.734004,0.641421,0.557147,0.497983,0.625304
FR003161,0.395938,0.395101,0.450488,0.567991,0.629935,0.701732,0.728848,0.717414,0.677835,0.587643,0.514846,0.453925,0.568475
FR003162,0.371609,0.364237,0.411578,0.518114,0.593113,0.709446,0.76069,0.735916,0.660243,0.551714,0.471509,0.419671,0.54732
FR003163,0.442145,0.429694,0.481977,0.590848,0.679673,0.745029,0.777289,0.762883,0.712551,0.615831,0.532074,0.475665,0.603805
...,...,...,...,...,...,...,...,...,...,...,...,...,...
HR000313,0.302235,0.284448,0.342598,0.510123,0.711384,0.728723,0.67482,0.648022,0.649419,0.591117,0.491394,0.379681,0.526164
HR000314,0.24763,0.228859,0.308063,0.472246,0.713062,0.749994,0.694664,0.660605,0.653992,0.576513,0.464734,0.328823,0.508265
HR000315,0.342179,0.323145,0.367421,0.558163,0.716084,0.711304,0.657347,0.631774,0.636228,0.591052,0.496394,0.403354,0.536204
HR000316,0.350151,0.331143,0.373428,0.568378,0.719765,0.711915,0.658882,0.633591,0.638342,0.594586,0.498856,0.408249,0.540607


In [20]:
# First we create an empty table data frame to assing the values to it
vegetation_df = pd.DataFrame(index = ndvi_moy_T.index)

# Now we proceed with the concatenation:
vegetation_df = pd.concat([LAI_moy_T, ndvi_moy_T], axis=1)

vegetation_df

Unnamed: 0,lai_01,lai_02,lai_03,lai_04,lai_05,lai_06,lai_07,lai_08,lai_09,lai_10,...,ndvi_04,ndvi_05,ndvi_06,ndvi_07,ndvi_08,ndvi_09,ndvi_10,ndvi_11,ndvi_12,ndvi_mean
FR003159,0.072971,0.055985,0.088706,0.168016,0.319755,0.361956,0.367758,0.349299,0.293194,0.135214,...,0.673271,0.792825,0.805946,0.808828,0.808303,0.77862,0.701243,0.632045,0.556229,0.678553
FR003160,0.058257,0.047785,0.07647,0.143477,0.24092,0.282768,0.319068,0.299939,0.244279,0.117239,...,0.616778,0.708918,0.756471,0.784622,0.774158,0.734004,0.641421,0.557147,0.497983,0.625304
FR003161,0.043986,0.036196,0.06625,0.115635,0.158911,0.195054,0.237305,0.214715,0.170166,0.091722,...,0.567991,0.629935,0.701732,0.728848,0.717414,0.677835,0.587643,0.514846,0.453925,0.568475
FR003162,0.040098,0.030246,0.054893,0.096385,0.137902,0.191191,0.260661,0.230837,0.165034,0.076892,...,0.518114,0.593113,0.709446,0.76069,0.735916,0.660243,0.551714,0.471509,0.419671,0.54732
FR003163,0.052833,0.042832,0.070209,0.130769,0.214044,0.259541,0.303148,0.281495,0.222871,0.105678,...,0.590848,0.679673,0.745029,0.777289,0.762883,0.712551,0.615831,0.532074,0.475665,0.603805
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
HR000313,0.027105,0.027564,0.045604,0.087686,0.212271,0.267582,0.232351,0.195295,0.16114,0.097688,...,0.510123,0.711384,0.728723,0.67482,0.648022,0.649419,0.591117,0.491394,0.379681,0.526164
HR000314,0.021976,0.023159,0.040452,0.075204,0.2137,0.296539,0.258889,0.216862,0.173829,0.096423,...,0.472246,0.713062,0.749994,0.694664,0.660605,0.653992,0.576513,0.464734,0.328823,0.508265
HR000315,0.027547,0.028347,0.045641,0.099001,0.202678,0.231035,0.199113,0.167758,0.140593,0.089682,...,0.558163,0.716084,0.711304,0.657347,0.631774,0.636228,0.591052,0.496394,0.403354,0.536204
HR000316,0.027627,0.028787,0.046078,0.103039,0.205757,0.229855,0.197851,0.166685,0.139821,0.089479,...,0.568378,0.719765,0.711915,0.658882,0.633591,0.638342,0.594586,0.498856,0.408249,0.540607


In [21]:
# Assign the "basin_id" to the gauges names:
vegetation_df.index.name = "basin_id"

In [22]:
# Assign the "date" to the df index:
LAI_df.index.name = "date"
LAI_yr.index.name = "date"
ndvi_df.index.name = "date"
ndvi_yr.index.name = "date"

In [23]:
# Round the data to 3 decimals
LAI_df = LAI_df.astype(float).round(3)
LAI_yr = LAI_yr.astype(float).round(3)
ndvi_df = ndvi_df.astype(float).round(3)
ndvi_yr = ndvi_yr.astype(float).round(3)
vegetation_df = vegetation_df.astype(float).round(3)

# Data export

In [24]:
# Export the final datasets:
# Time-series:
LAI_df.to_csv(PATH_OUTPUT_TS+"/estreams_LAI_monhtly.csv")
LAI_yr.to_csv(PATH_OUTPUT_TS+"/estreams_LAI_yearly.csv")

ndvi_df.to_csv(PATH_OUTPUT_TS+"/estreams_NDVI_monhtly.csv")
ndvi_yr.to_csv(PATH_OUTPUT_TS+"/estreams_NDVI_yearly.csv")

# Static attributes:
vegetation_df.to_csv(PATH_OUTPUT_ST+"/estreams_vegetation_attributes.csv")

# End