# Hydro-meteorological signatures computation

Author: Thiago Nascimento (thiago.nascimento@eawag.ch)

This notebook is part of the EStreams publication and was used to the computation of the miscelaneous of hydro-meteorological signatures provided in this publication.

* Note that this code enables not only the replicability of the current database but also the extrapolation to new catchment areas. 
* Additionally, the user should download and insert the original raw-data in the folder of the same name prior to run this code. 
* The original third-party data used were not made available in this repository due to redistribution and storage-space reasons.  

## Requirements
**Python:**

* Python>=3.6
* Jupyter
* geopandas=0.10.2
* hydroanalysis https://pypi.org/project/hydroanalysis/ (Last access: 30 December 2023)
* numpy
* os
* pandas=2.1.3
* scipy=1.9.0
* tqdm

Check the Github repository for an environment.yml (for conda environments) or requirements.txt (pip) file.

**Files:**

* data/streamflow/estreams_timeseries_discharge.csv
* results/timeseries/meteorology/estreams_timeseries_precipitation.csv
* results/timeseries/meteorology/estreams_timeseries_temperature.csv
* results/timeseries/meteorology/estreams_timeseries_pet.csv
* data/streamflow/estreams_gauging_stations.csv
* data/shapefiles/estreams_catchments.shp

**Directory:**

* Clone the GitHub directory locally
* Place any third-data variables in their respective directory.
* ONLY update the "PATH" variable in the section "Configurations", with their relative path to the EStreams directory. 


## References
* Addor, N., Newman, A. J., Mizukami, N., and Clark, M. P.: The CAMELS data set: catchment attributes and meteorology for large-sample studies, Hydrol. Earth Syst. Sci., 21, 5293-5313, https://doi.org/10.5194/hess-21-5293-2017, 2017.
* https://github.com/naddor/camels/blob/master/clim/clim_indices.R

## Observations
* Here we compute the hydro-climatic signatures well-discussed and computed in the Camels-like publications and avaialable at the HydroAnalysis module, which is based on Addor et al. (2017).

# Import modules

In [1]:
import pandas as pd
import geopandas as gpd
import numpy as np
import tqdm as tqdm
import os
from utils.streamflowindices import calculate_hydro_year
from utils.general import count_num_measurements, find_first_non_nan_dates, find_last_non_nan_dates, calculate_areas_when_0, calculate_specific_discharge
import warnings
import hydroanalysis #Make sure to have this module locally installed

# Configurations

In [2]:
# Only editable variables:
# Relative path to your local directory
PATH = "../../.."
# Suppress all warnings
warnings.filterwarnings("ignore")

* #### The users should NOT change anything in the code below here. 

In [3]:
# Non-editable variables:
PATH_OUTPUT = "results/staticattributes/"

# Set the directory:
os.chdir(PATH)

# Import data

## Daily discharge
It is important to note that this time series was already filtered under a quality-check to delete negative values. 

In [4]:
timeseries_discharge = pd.read_csv("data/streamflow/estreams_timeseries_streamflow.csv", index_col=0)
timeseries_discharge.index = pd.to_datetime(timeseries_discharge.index)
timeseries_discharge.index.name = "date"

KeyboardInterrupt: 

## Precipitation

In [5]:
timeseries_precipitation = pd.read_csv("results/timeseries/meteorology/estreams_meteorology_precipitation.csv", index_col=0)
timeseries_precipitation.index = pd.to_datetime(timeseries_precipitation.index)
timeseries_precipitation.index.name = "date"
timeseries_precipitation

Unnamed: 0_level_0,AT000001,AT000002,AT000003,AT000004,AT000005,AT000006,AT000007,AT000008,AT000009,AT000010,...,UAGR0012,UAGR0013,UAGR0014,UAGR0015,UAGR0016,UAGR0017,UAGR0018,UAGR0019,UAGR0020,UAGR0021
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1950-01-01,0.00,0.00,0.00,0.0,0.00,0.00,0.00,0.0,0.00,0.00,...,0.00,0.01,0.01,0.00,0.00,2.79,2.9,3.73,0.01,0.00
1950-01-02,17.16,22.31,20.57,20.7,24.77,23.94,22.48,24.0,22.64,23.95,...,0.44,0.60,0.43,0.00,0.00,0.00,0.0,0.00,0.00,0.00
1950-01-03,25.71,21.56,22.79,18.8,27.59,23.36,24.06,27.4,24.41,19.97,...,1.75,1.61,1.77,0.14,1.35,2.86,2.6,2.40,5.19,1.78
1950-01-04,30.63,37.41,35.76,35.8,36.43,38.71,36.85,33.5,36.48,36.50,...,0.81,1.04,0.78,0.08,0.00,8.54,8.4,8.01,0.00,0.00
1950-01-05,0.07,1.97,1.38,2.7,0.12,1.12,1.08,0.0,0.97,0.78,...,1.04,1.06,1.00,0.00,0.64,1.47,1.5,2.00,0.00,2.63
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2023-06-26,0.00,0.00,0.00,0.0,0.00,0.00,0.00,0.0,0.00,0.00,...,9.38,9.88,9.32,0.00,0.00,,,,14.00,16.22
2023-06-27,0.00,0.00,0.00,0.0,0.00,0.00,0.00,0.0,0.00,0.00,...,3.17,3.72,3.08,7.03,0.00,,,,0.00,0.00
2023-06-28,0.00,0.00,0.00,0.0,0.00,0.00,0.00,0.0,0.00,0.00,...,4.38,5.40,4.25,2.77,0.00,,,,0.00,1.15
2023-06-29,10.72,11.08,10.62,11.6,8.92,9.46,10.13,8.3,9.93,7.26,...,4.97,4.90,4.97,3.07,3.35,,,,5.90,7.30


## PET

In [6]:
timeseries_pet = pd.read_csv("results/timeseries/meteorology/estreams_meteorology_pet.csv", index_col=0)
timeseries_pet.index = pd.to_datetime(timeseries_pet.index)
timeseries_pet.index.name = "date"
timeseries_pet

Unnamed: 0_level_0,AT000001,AT000002,AT000003,AT000004,AT000005,AT000006,AT000007,AT000008,AT000009,AT000010,...,UAGR0012,UAGR0013,UAGR0014,UAGR0015,UAGR0016,UAGR0017,UAGR0018,UAGR0019,UAGR0020,UAGR0021
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1950-01-01,0.27,0.26,0.28,0.25,0.29,0.28,0.28,0.29,0.28,0.28,...,0.05,0.04,0.05,0.09,0.08,0.34,0.32,0.30,0.15,0.02
1950-01-02,0.33,0.42,0.38,0.42,0.43,0.45,0.41,0.42,0.41,0.49,...,0.10,0.12,0.10,0.27,0.11,0.13,0.12,0.13,0.04,0.00
1950-01-03,0.27,0.29,0.27,0.28,0.29,0.31,0.28,0.28,0.28,0.36,...,0.17,0.13,0.18,0.23,0.36,0.54,0.53,0.55,0.41,0.32
1950-01-04,0.17,0.12,0.12,0.10,0.18,0.16,0.14,0.20,0.15,0.21,...,0.13,0.11,0.13,0.23,0.23,0.28,0.27,0.28,0.23,0.24
1950-01-05,0.28,0.33,0.32,0.30,0.38,0.37,0.35,0.38,0.35,0.40,...,0.11,0.08,0.11,0.20,0.25,0.35,0.35,0.35,0.28,0.17
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2023-06-26,4.05,3.80,3.82,3.48,4.92,4.45,4.18,5.15,4.28,5.19,...,3.32,3.45,3.32,4.67,3.44,,,,3.33,3.55
2023-06-27,2.90,3.27,3.08,3.15,3.59,3.62,3.31,3.63,3.34,4.20,...,3.83,3.92,3.83,4.10,4.13,,,,3.61,3.14
2023-06-28,3.05,2.42,2.36,2.14,3.46,2.99,2.72,3.71,2.83,3.62,...,3.68,3.62,3.69,3.74,3.91,,,,3.94,3.87
2023-06-29,3.61,4.14,4.03,4.14,4.34,4.39,4.16,4.48,4.20,5.00,...,3.89,4.03,3.88,4.23,3.78,,,,3.35,3.10


## Temperature

In [7]:
timeseries_temperature = pd.read_csv("results/timeseries/meteorology/estreams_meteorology_temperature.csv", index_col=0)
timeseries_temperature.index = pd.to_datetime(timeseries_temperature.index)
timeseries_temperature.index.name = "date"
timeseries_temperature

Unnamed: 0_level_0,AT000001,AT000002,AT000003,AT000004,AT000005,AT000006,AT000007,AT000008,AT000009,AT000010,...,UAGR0012,UAGR0013,UAGR0014,UAGR0015,UAGR0016,UAGR0017,UAGR0018,UAGR0019,UAGR0020,UAGR0021
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1950-01-01,-8.01,-7.14,-6.99,-8.05,-4.59,-5.92,-6.23,-4.31,-6.03,-5.13,...,-15.09,-15.05,-15.05,-13.15,-14.41,-9.73,-10.16,-10.80,-12.86,-17.14
1950-01-02,-5.37,-3.37,-3.95,-3.60,-2.77,-2.66,-3.39,-2.72,-3.32,-1.83,...,-12.40,-11.48,-12.49,-5.69,-12.73,-12.57,-12.78,-12.66,-15.70,-18.19
1950-01-03,-4.26,-2.62,-3.39,-3.06,-1.27,-1.39,-2.47,-1.06,-2.32,0.51,...,-6.26,-6.70,-6.16,-2.63,-2.73,-2.59,-2.87,-2.37,-4.42,-8.29
1950-01-04,-5.76,-5.76,-6.02,-6.38,-3.34,-4.20,-5.08,-2.73,-4.82,-2.04,...,-8.03,-9.56,-7.85,-2.51,-2.16,-0.98,-1.21,-0.79,-1.96,-4.71
1950-01-05,-5.80,-3.32,-4.20,-3.96,-1.09,-1.98,-3.00,-0.62,-2.74,-0.34,...,-12.62,-13.79,-12.40,-8.91,-7.32,-6.34,-6.65,-6.55,-5.74,-11.19
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2023-06-26,14.61,15.14,14.75,14.26,18.57,17.10,16.06,19.45,16.43,19.79,...,17.19,16.80,17.29,18.53,19.48,,,,19.58,19.38
2023-06-27,12.31,13.05,12.71,12.21,16.17,14.98,13.93,16.91,14.25,17.86,...,17.06,16.66,17.16,17.04,19.38,,,,20.15,18.85
2023-06-28,9.87,10.16,9.74,9.16,13.95,12.39,11.20,14.89,11.60,15.63,...,16.53,16.09,16.62,14.96,18.60,,,,18.84,17.72
2023-06-29,10.85,11.88,11.43,11.45,14.46,13.34,12.46,15.46,12.79,16.11,...,16.70,16.77,16.70,16.16,16.25,,,,17.08,17.19


## Streamflow gauges network

In [5]:
network_estreams = pd.read_csv('data/streamflow/estreams_gauging_stations.csv', encoding='utf-8')
network_estreams.set_index("basin_id", inplace = True)
network_estreams

Unnamed: 0_level_0,gauge_id,gauge_name,gauge_country,gauge_provider,river,lon_snap,lat_snap,lon,lat,area,...,start_date,end_date,num_years,num_months,num_days,num_days_gaps,num_continuous_days,duplicated_suspect,watershed_group,gauges_upstream
basin_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
AT000001,200014,Bangs,AT,AT_EHYD,Rhein,9.534835,47.273748,9.534835,47.273748,4647.9,...,1996-01-01 00:00:00,2019-12-31 00:00:00,24,288,8766,0.0,8766,CH000197,1,13
AT000002,200048,Schruns (Vonbunweg),AT,AT_EHYD,Litz,9.913677,47.080301,9.913677,47.080301,102.0,...,1958-10-01 00:00:00,2019-12-31 00:00:00,62,735,22372,0.0,22372,CH000221,1,0
AT000003,231662,Loruens-Aeule,AT,AT_EHYD,Ill,9.847765,47.132821,9.847765,47.132821,535.2,...,1985-01-02 00:00:00,2019-12-31 00:00:00,35,420,12782,0.0,12782,CH000215,1,1
AT000004,200592,Kloesterle (OEBB),AT,AT_EHYD,Alfenz,10.061843,47.128994,10.061843,47.128994,66.6,...,1998-01-02 00:00:00,2019-12-31 00:00:00,22,264,8034,0.0,8034,CH000227,1,0
AT000005,200097,Buers (Bruecke L82),AT,AT_EHYD,Alvier,9.802668,47.150770,9.802668,47.150770,72.2,...,1990-01-01 00:00:00,2019-12-31 00:00:00,30,360,10957,0.0,10957,CH000214,1,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
UAGR0017,6682300,BASHTANOVKA,UA,UA_GRDC,KACHA,33.894739,44.691884,33.900000,44.683333,321.0,...,1978-01-01 00:00:00,1987-12-31 00:00:00,10,120,3652,0.0,3652,,1916,0
UAGR0018,6682500,YALTA,UA,UA_GRDC,DERE-KIOY,34.166667,44.500000,34.166667,44.500000,49.7,...,1978-01-01 00:00:00,1987-12-31 00:00:00,10,120,3652,0.0,3652,,1917,0
UAGR0019,6683010,PIONERSKOE,UA,UA_GRDC,SALHYR,34.199841,44.887685,34.200000,44.883333,261.0,...,1978-01-01 00:00:00,1987-12-31 00:00:00,10,120,3652,0.0,3652,,1918,0
UAGR0020,6683200,TOKMAK,UA,UA_GRDC,TOKMAK,35.705833,47.251389,35.705833,47.251389,760.0,...,1978-01-01 00:00:00,1987-12-31 00:00:00,10,120,3652,0.0,3652,,1919,0


## Catchment boundaries

In [9]:
catchment_boundaries = gpd.read_file('data/shapefiles/estreams_catchments.shp')
catchment_boundaries.set_index("basin_id", inplace = True)
catchment_boundaries.head()

Unnamed: 0_level_0,gauge_id,gauge_coun,area,area_calc,area_flag,area_perc,start_date,end_date,gauge_hier,watershed_,geometry
basin_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
AT000001,200014,AT,4647.9,4668.379,0,-0.440608,1996-01-01,2019-12-31,14,1,"POLYGON Z ((9.69406 46.54322 0.00000, 9.69570 ..."
AT000002,200048,AT,102.0,102.287,0,-0.281373,1958-10-01,2019-12-31,1,1,"POLYGON Z ((10.13650 47.02949 0.00000, 10.1349..."
AT000003,231662,AT,535.2,536.299,0,-0.205344,1985-01-02,2019-12-31,2,1,"POLYGON Z ((10.11095 46.89437 0.00000, 10.1122..."
AT000004,200592,AT,66.6,66.286,0,0.471471,1998-01-02,2019-12-31,1,1,"POLYGON Z ((10.14189 47.09706 0.00000, 10.1404..."
AT000005,200097,AT,72.2,72.448,0,-0.34349,1990-01-01,2019-12-31,1,1,"POLYGON Z ((9.67851 47.06249 0.00000, 9.67888 ..."


In [10]:
print("The total number of catchments to be processed are:", len(catchment_boundaries))

The total number of catchments to be processed are: 15047


# Computation processing


## Computation of time specific discharge

In [11]:
# Adjust areas that might be zero: 
network_estreams = calculate_areas_when_0(network_estreams, catchment_boundaries)
network_estreams

Unnamed: 0_level_0,gauge_id,gauge_name,gauge_country,gauge_provider,river,lon_snap,lat_snap,lon,lat,area,...,start_date,end_date,num_years,num_months,num_days,num_days_gaps,num_continuous_days,duplicated_suspect,watershed_group,gauge_hierarchy
basin_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
AT000001,200014,Bangs,AT,AT_EHYD,Rhein,9.534835,47.273748,9.534835,47.273748,4647.9,...,1996-01-01,2019-12-31,24,288,8766,0.0,8766,CH000197,1,14
AT000002,200048,Schruns (Vonbunweg),AT,AT_EHYD,Litz,9.913677,47.080301,9.913677,47.080301,102.0,...,1958-10-01,2019-12-31,62,735,22372,0.0,22372,CH000221,1,1
AT000003,231662,Loruens-Aeule,AT,AT_EHYD,Ill,9.847765,47.132821,9.847765,47.132821,535.2,...,1985-01-02,2019-12-31,35,420,12782,0.0,12782,CH000215,1,2
AT000004,200592,Kloesterle (OEBB),AT,AT_EHYD,Alfenz,10.061843,47.128994,10.061843,47.128994,66.6,...,1998-01-02,2019-12-31,22,264,8034,0.0,8034,CH000227,1,1
AT000005,200097,Buers (Bruecke L82),AT,AT_EHYD,Alvier,9.802668,47.150770,9.802668,47.150770,72.2,...,1990-01-01,2019-12-31,30,360,10957,0.0,10957,CH000214,1,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
UAGR0017,6682300,BASHTANOVKA,UA,UA_GRDC,KACHA,33.894739,44.691884,33.900000,44.683333,321.0,...,1978-01-01,1987-12-31,10,120,3652,0.0,3652,,1916,1
UAGR0018,6682500,YALTA,UA,UA_GRDC,DERE-KIOY,34.166667,44.500000,34.166667,44.500000,49.7,...,1978-01-01,1987-12-31,10,120,3652,0.0,3652,,1917,1
UAGR0019,6683010,PIONERSKOE,UA,UA_GRDC,SALHYR,34.199841,44.887685,34.200000,44.883333,261.0,...,1978-01-01,1987-12-31,10,120,3652,0.0,3652,,1918,1
UAGR0020,6683200,TOKMAK,UA,UA_GRDC,TOKMAK,35.705833,47.251389,35.705833,47.251389,760.0,...,1978-01-01,1987-12-31,10,120,3652,0.0,3652,,1919,1


In [12]:
# Specific discharge computation
timeseries_discharge = calculate_specific_discharge(network_estreams, timeseries_discharge)

## Filtering the data

In [14]:
# Here we can filter only the gauges with non "999" or "888" values:
network_estreams_filtered = network_estreams[network_estreams.area_flag < 888.0]
network_estreams_filtered

Unnamed: 0_level_0,gauge_id,gauge_name,gauge_country,gauge_provider,river,lon_snap,lat_snap,lon,lat,area,...,start_date,end_date,num_years,num_months,num_days,num_days_gaps,num_continuous_days,duplicated_suspect,watershed_group,gauge_hierarchy
basin_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
AT000001,200014,Bangs,AT,AT_EHYD,Rhein,9.534835,47.273748,9.534835,47.273748,4647.9,...,1996-01-01,2019-12-31,24,288,8766,0.0,8766,CH000197,1,14
AT000002,200048,Schruns (Vonbunweg),AT,AT_EHYD,Litz,9.913677,47.080301,9.913677,47.080301,102.0,...,1958-10-01,2019-12-31,62,735,22372,0.0,22372,CH000221,1,1
AT000003,231662,Loruens-Aeule,AT,AT_EHYD,Ill,9.847765,47.132821,9.847765,47.132821,535.2,...,1985-01-02,2019-12-31,35,420,12782,0.0,12782,CH000215,1,2
AT000004,200592,Kloesterle (OEBB),AT,AT_EHYD,Alfenz,10.061843,47.128994,10.061843,47.128994,66.6,...,1998-01-02,2019-12-31,22,264,8034,0.0,8034,CH000227,1,1
AT000005,200097,Buers (Bruecke L82),AT,AT_EHYD,Alvier,9.802668,47.150770,9.802668,47.150770,72.2,...,1990-01-01,2019-12-31,30,360,10957,0.0,10957,CH000214,1,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
UAGR0017,6682300,BASHTANOVKA,UA,UA_GRDC,KACHA,33.894739,44.691884,33.900000,44.683333,321.0,...,1978-01-01,1987-12-31,10,120,3652,0.0,3652,,1916,1
UAGR0018,6682500,YALTA,UA,UA_GRDC,DERE-KIOY,34.166667,44.500000,34.166667,44.500000,49.7,...,1978-01-01,1987-12-31,10,120,3652,0.0,3652,,1917,1
UAGR0019,6683010,PIONERSKOE,UA,UA_GRDC,SALHYR,34.199841,44.887685,34.200000,44.883333,261.0,...,1978-01-01,1987-12-31,10,120,3652,0.0,3652,,1918,1
UAGR0020,6683200,TOKMAK,UA,UA_GRDC,TOKMAK,35.705833,47.251389,35.705833,47.251389,760.0,...,1978-01-01,1987-12-31,10,120,3652,0.0,3652,,1919,1


In [15]:
# Here we can filter only the gauges at least 1 year of consecutive measured days:
network_estreams_filtered = network_estreams_filtered[network_estreams_filtered.num_continuous_days >= 365]
network_estreams_filtered

Unnamed: 0_level_0,gauge_id,gauge_name,gauge_country,gauge_provider,river,lon_snap,lat_snap,lon,lat,area,...,start_date,end_date,num_years,num_months,num_days,num_days_gaps,num_continuous_days,duplicated_suspect,watershed_group,gauge_hierarchy
basin_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
AT000001,200014,Bangs,AT,AT_EHYD,Rhein,9.534835,47.273748,9.534835,47.273748,4647.9,...,1996-01-01,2019-12-31,24,288,8766,0.0,8766,CH000197,1,14
AT000002,200048,Schruns (Vonbunweg),AT,AT_EHYD,Litz,9.913677,47.080301,9.913677,47.080301,102.0,...,1958-10-01,2019-12-31,62,735,22372,0.0,22372,CH000221,1,1
AT000003,231662,Loruens-Aeule,AT,AT_EHYD,Ill,9.847765,47.132821,9.847765,47.132821,535.2,...,1985-01-02,2019-12-31,35,420,12782,0.0,12782,CH000215,1,2
AT000004,200592,Kloesterle (OEBB),AT,AT_EHYD,Alfenz,10.061843,47.128994,10.061843,47.128994,66.6,...,1998-01-02,2019-12-31,22,264,8034,0.0,8034,CH000227,1,1
AT000005,200097,Buers (Bruecke L82),AT,AT_EHYD,Alvier,9.802668,47.150770,9.802668,47.150770,72.2,...,1990-01-01,2019-12-31,30,360,10957,0.0,10957,CH000214,1,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
UAGR0017,6682300,BASHTANOVKA,UA,UA_GRDC,KACHA,33.894739,44.691884,33.900000,44.683333,321.0,...,1978-01-01,1987-12-31,10,120,3652,0.0,3652,,1916,1
UAGR0018,6682500,YALTA,UA,UA_GRDC,DERE-KIOY,34.166667,44.500000,34.166667,44.500000,49.7,...,1978-01-01,1987-12-31,10,120,3652,0.0,3652,,1917,1
UAGR0019,6683010,PIONERSKOE,UA,UA_GRDC,SALHYR,34.199841,44.887685,34.200000,44.883333,261.0,...,1978-01-01,1987-12-31,10,120,3652,0.0,3652,,1918,1
UAGR0020,6683200,TOKMAK,UA,UA_GRDC,TOKMAK,35.705833,47.251389,35.705833,47.251389,760.0,...,1978-01-01,1987-12-31,10,120,3652,0.0,3652,,1919,1


## Preprocessing

### Filter the time series
* At this part we can filter the hydro-climatic time series to the filtered gauges.

In [16]:
# Specific discharge
timeseries_discharge = timeseries_discharge.loc[:, network_estreams_filtered.index]

# Precipitation
timeseries_precipitation = timeseries_precipitation.loc[:, network_estreams_filtered.index]

# Temperature
timeseries_temperature = timeseries_temperature.loc[:, network_estreams_filtered.index]

# Potential evapotranspiration
timeseries_pet = timeseries_pet.loc[:, network_estreams_filtered.index]

### Subset the streamflow time series
* Here we subset the streamflow time series to the same time-period of the meteorology 

In [17]:
timeseries_discharge = timeseries_discharge.loc[timeseries_precipitation.index,:]

### Compute quality masks
* We need to compute a mask which will assin "1" to NaNs and "0" to good quality data

In [18]:
# Quality-mask for joint specific discharge and precipitation (Hydrological signatures):  
quality_discharge_precipitation = (pd.isna(timeseries_discharge) | pd.isna(timeseries_precipitation)).astype(int)

# Quality-mask for joint precipitation, pet and temperature (Climatic signatures):
quality_pet_precipitation_temperature = (pd.isna(timeseries_precipitation) | pd.isna(timeseries_pet) | pd.isna(timeseries_temperature)).astype(int)

### Calculate the hydrological years

In [19]:
hydro_year = calculate_hydro_year(date=timeseries_discharge.index, first_month=10)
hydro_year

array([1950, 1950, 1950, ..., 2023, 2023, 2023], dtype=int32)

## Signatures computation

In [20]:
hydrometeo_signatures_df = pd.DataFrame(index = network_estreams_filtered.index, 
                                        columns = ["q_corr", "q_mean", "q_runoff_ratio", "q_elas_Sawicz", 
                                                   "q_elas_Sankarasubramanian", "slope_sawicz", "slope_yadav",
                                                   "slope_mcmillan", "slope_addor", "baseflow_index", "hfd_mean",
                                                   "hfd_std", "q_5", "q_95", "hq_freq", "hq_dur", "lq_freq", 
                                                   "lq_dur", "zero_q_freq", "p_mean", "pet_mean", "aridity", 
                                                   "p_seasonality", "frac_snow", "hp_freq",
                                                   "hp_dur", "hp_time", "lp_freq", "lp_dur",
                                                   "lp_time"
                                                  ])

In [21]:
# Streamflow signatures
for gauge in tqdm.tqdm(timeseries_discharge.columns):
        
    # Correlation between runoff and precipitation
    hydrometeo_signatures_df.loc[gauge, "q_corr"] = timeseries_discharge.loc[:, gauge].corr(timeseries_precipitation.loc[:, gauge])
    
    # Runoff mean (mm/day)
    hydrometeo_signatures_df.loc[gauge, "q_mean"] = hydroanalysis.streamflow_signatures.calculate_q_mean(timeseries_discharge.loc[:, gauge].values, quality_discharge_precipitation.loc[:, gauge].values)
    
    # Runoff ratio (-)
    hydrometeo_signatures_df.loc[gauge, "q_runoff_ratio"] = hydroanalysis.streamflow_signatures.calculate_runoff_ratio(streamflow = timeseries_discharge.loc[:, gauge].values,
                                                                                              quality  = quality_discharge_precipitation.loc[:, gauge].values,
                                                                                            precipitation = timeseries_precipitation.loc[:, gauge].values)
    # Streamflow elasticity (-)
    elas_gauge = hydroanalysis.streamflow_signatures.calculate_stream_elas(streamflow = timeseries_discharge.loc[:, gauge].values,
                                                                quality  = quality_discharge_precipitation.loc[:, gauge].values,
                                                                precipitation = timeseries_precipitation.loc[:, gauge].values,
                                                                hydro_year  = hydro_year)
    try:
        hydrometeo_signatures_df.loc[gauge, ["q_elas_Sawicz", 
                                         "q_elas_Sankarasubramanian"]] = elas_gauge["Sawicz"],elas_gauge["Sankarasubramanian"]
    except: 
        hydrometeo_signatures_df.loc[gauge, ["q_elas_Sawicz", 
                                         "q_elas_Sankarasubramanian"]] = np.nan, np.nan
    
    # Slope (-)
    slope_gauge = hydroanalysis.streamflow_signatures.calculate_slope_fdc(streamflow = timeseries_discharge.loc[:, gauge].values,                                                                  
                                                                          quality  = quality_discharge_precipitation.loc[:, gauge].values)
    try:
        hydrometeo_signatures_df.loc[gauge, ["slope_sawicz", "slope_yadav",
                                         "slope_mcmillan", "slope_addor"]] = slope_gauge["Sawicz"],slope_gauge["Yadav"],slope_gauge["McMillan"],slope_gauge["Addor"]
    except: 
        hydrometeo_signatures_df.loc[gauge, ["slope_sawicz", "slope_yadav",
                                         "slope_mcmillan", "slope_addor"]] = np.nan, np.nan, np.nan, np.nan
    # Baseflow index (-)
    hydrometeo_signatures_df.loc[gauge, "baseflow_index"] = hydroanalysis.streamflow_signatures.calculate_baseflow_index(streamflow = timeseries_discharge.loc[:, gauge].values, 
                                                                                                quality = quality_discharge_precipitation.loc[:, gauge].values)
    
    # Half-flow duration (days)
    hfd_gauge = hydroanalysis.streamflow_signatures.calculate_hfd_mean(streamflow = timeseries_discharge.loc[:, gauge].values,
                                                                quality  = quality_discharge_precipitation.loc[:, gauge].values,
                                                                hydro_year = hydro_year)
    try:
        hydrometeo_signatures_df.loc[gauge, ["hfd_mean", 
                          "hfd_std"]] = hfd_gauge["hfd_mean"],hfd_gauge["hfd_std"]
    except:
        hydrometeo_signatures_df.loc[gauge, ["hfd_mean", 
                          "hfd_std"]] = np.nan, np.nan
        
    # Q5 (mm/day)
    hydrometeo_signatures_df.loc[gauge, "q_5"] = hydroanalysis.streamflow_signatures.calculate_q_5(streamflow = timeseries_discharge.loc[:, gauge].values, 
                                                                          quality = quality_discharge_precipitation.loc[:, gauge].values)
    # Q95 (mm/day)
    hydrometeo_signatures_df.loc[gauge, "q_95"] = hydroanalysis.streamflow_signatures.calculate_q_95(streamflow = timeseries_discharge.loc[:, gauge].values, 
                                                                          quality = quality_discharge_precipitation.loc[:, gauge].values)
    
    # High-flow frequency (days/year) and mean duration (days)
    hq_gauge = hydroanalysis.streamflow_signatures.calculate_high_q_freq_dur(streamflow = timeseries_discharge.loc[:, gauge].values,
                                                                              quality  = quality_discharge_precipitation.loc[:, gauge].values)
    
    try: 
        hydrometeo_signatures_df.loc[gauge, ["hq_freq", 
                                         "hq_dur"]] = hq_gauge["hq_freq"],hq_gauge["hq_dur"]
    except:
        hydrometeo_signatures_df.loc[gauge, ["hq_freq", 
                                         "hq_dur"]] = np.nan, np.nan

    # Low-flow frequency (days/year) and mean duration (days)
    lq_gauge = hydroanalysis.streamflow_signatures.calculate_low_q_freq_dur(streamflow = timeseries_discharge.loc[:, gauge].values,
                                                                              quality  = quality_discharge_precipitation.loc[:, gauge].values)
    
    try:
        hydrometeo_signatures_df.loc[gauge, ["lq_freq", 
                                         "lq_dur"]] = lq_gauge["lq_freq"],lq_gauge["lq_dur"]
    except:
        hydrometeo_signatures_df.loc[gauge, ["lq_freq", 
                                         "lq_dur"]] = np.nan, np.nan
    
    # Zero-flow frequency (-)
    hydrometeo_signatures_df.loc[gauge, "zero_q_freq"] = hydroanalysis.streamflow_signatures.calculate_zero_q_freq(streamflow = timeseries_discharge.loc[:, gauge].values, 
                                                                          quality = quality_discharge_precipitation.loc[:, gauge].values)

hydrometeo_signatures_df = hydrometeo_signatures_df.apply(pd.to_numeric, errors='coerce')

100%|██████████| 11937/11937 [17:03<00:00, 11.66it/s]


In [22]:
# Meteorological signatures

for gauge in tqdm.tqdm(timeseries_discharge.columns):
        
    # P mean (mm/day)
    hydrometeo_signatures_df.loc[gauge, "p_mean"] = hydroanalysis.meteo_indexes.calculate_p_mean(precipitation = timeseries_precipitation.loc[:, gauge].values,
                                                                        quality  = quality_pet_precipitation_temperature.loc[:, gauge].values)
    
    # PET mean (mm/day)
    hydrometeo_signatures_df.loc[gauge, "pet_mean"] = hydroanalysis.meteo_indexes.calculate_pet_mean(pet = timeseries_pet.loc[:, gauge].values,
                                                                          quality  = quality_pet_precipitation_temperature.loc[:, gauge].values)
    
    # Aridity index (-)
    hydrometeo_signatures_df.loc[gauge, "aridity"] = hydroanalysis.meteo_indexes.calculate_aridity(precipitation = timeseries_precipitation.loc[:, gauge].values,
                                                                pet = timeseries_pet.loc[:, gauge].values,
                                                                quality  = quality_pet_precipitation_temperature.loc[:, gauge].values)
    # Precipitation seasonality (-)
    hydrometeo_signatures_df.loc[gauge, "p_seasonality"] = hydroanalysis.meteo_indexes.calculate_p_seasonality(precipitation = timeseries_precipitation.loc[:, gauge].values,
                                                                                quality  = quality_pet_precipitation_temperature.loc[:, gauge].values,
                                                                                date =timeseries_precipitation.loc[:, gauge].index,
                                                                                temperature = timeseries_temperature.loc[:, gauge].values)
    # Fraction of snow (-)
    hydrometeo_signatures_df.loc[gauge, "frac_snow"] = hydroanalysis.meteo_indexes.calculate_frac_snow(precipitation = timeseries_precipitation.loc[:, gauge].values,
                                                                              temperature = timeseries_temperature.loc[:, gauge].values,
                                                                              quality  = quality_pet_precipitation_temperature.loc[:, gauge].values,
                                                                              threshold=0.0)
    
    # High-precipitation frequency time
    high_prec_freq_time_gauge = hydroanalysis.meteo_indexes.calculate_high_prec_freq_time(precipitation = timeseries_precipitation.loc[:, gauge].values,
                                                                              quality  = quality_pet_precipitation_temperature.loc[:, gauge].values,
                                                                              date = timeseries_temperature.loc[:, gauge].index)
    try:
        hydrometeo_signatures_df.loc[gauge, ["hp_freq", 
                          "hp_dur", "hp_time"]] = high_prec_freq_time_gauge["hp_freq"],high_prec_freq_time_gauge["hp_dur"], high_prec_freq_time_gauge["hp_time"]
    except:
        hydrometeo_signatures_df.loc[gauge, ["hp_freq", 
                          "hp_dur", "hp_time"]] = np.nan, np.nan, np.nan
    
    # Low-precipitation frequency time
    low_prec_freq_time_gauge = hydroanalysis.meteo_indexes.calculate_low_prec_freq_time(precipitation = timeseries_precipitation.loc[:, gauge].values,
                                                                              quality  = quality_pet_precipitation_temperature.loc[:, gauge].values,
                                                                              date = timeseries_temperature.loc[:, gauge].index)
    try:
        hydrometeo_signatures_df.loc[gauge, ["lp_freq", 
                          "lp_dur", "lp_time"]] = low_prec_freq_time_gauge["lp_freq"],low_prec_freq_time_gauge["lp_dur"], low_prec_freq_time_gauge["lp_time"]
    except:
        hydrometeo_signatures_df.loc[gauge, ["lp_freq", 
                          "lp_dur", "lp_time"]] = np.nan, np.nan, np.nan
      

100%|██████████| 11937/11937 [51:40<00:00,  3.85it/s] 


## Number of measurements used:

In [23]:
# Number of measurements:
hydrometeo_signatures_df[["num_days", "num_months", "num_months_complete", "num_years_hydro", "num_years_complete"]] = count_num_measurements(timeseries = timeseries_discharge)
hydrometeo_signatures_df.drop(["num_days", "num_months", "num_months_complete", "num_years_complete"], axis = 1, inplace = True)

hydrometeo_signatures_df["start_date_hydro"] = find_first_non_nan_dates(timeseries_discharge)
hydrometeo_signatures_df["end_date_hydro"] = find_last_non_nan_dates(timeseries_discharge)

In [24]:
 # Number of measurements:
hydrometeo_signatures_df[["num_days", "num_months", "num_months_complete", "num_years_climatic", "num_years_complete"]] = count_num_measurements(timeseries = timeseries_pet)
hydrometeo_signatures_df.drop(["num_days", "num_months", "num_months_complete", "num_years_complete"], axis = 1, inplace = True)
hydrometeo_signatures_df["start_date_climatic"] = find_first_non_nan_dates(timeseries_pet)
hydrometeo_signatures_df["end_date_climatic"] = find_last_non_nan_dates(timeseries_pet)

hydrometeo_signatures_df

Unnamed: 0_level_0,q_corr,q_mean,q_runoff_ratio,q_elas_Sawicz,q_elas_Sankarasubramanian,slope_sawicz,slope_yadav,slope_mcmillan,slope_addor,baseflow_index,...,hp_time,lp_freq,lp_dur,lp_time,num_years_hydro,start_date_hydro,end_date_hydro,num_years_climatic,start_date_climatic,end_date_climatic
basin_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
AT000001,0.172251,2.819267,0.722440,1.113456,1.220884,1.531698,1.247084,1.531698,1.501235,0.760483,...,Summer,198.993490,3.560993,Fall,24,1996-01-01,2019-12-31,74,1950-01-01,2023-06-30
AT000002,0.177957,3.902826,1.003843,0.830220,1.217526,2.488963,1.770777,2.488963,2.442830,0.718639,...,Summer,204.041462,3.576437,Fall,62,1958-10-01,2019-12-31,74,1950-01-01,2023-06-30
AT000003,0.130088,0.914848,0.244704,1.690838,1.978182,1.007297,0.667043,1.007297,0.981304,0.688742,...,Summer,202.000503,3.592064,Fall,35,1985-01-02,2019-12-31,74,1950-01-01,2023-06-30
AT000004,0.167084,5.062633,1.302676,0.769999,0.424427,2.181063,1.677413,2.181063,2.174904,0.746374,...,Summer,205.802835,3.597573,Fall,22,1998-01-02,2019-12-31,74,1950-01-01,2023-06-30
AT000005,0.231827,3.318514,0.805982,1.168462,0.820444,1.966594,1.611727,1.966594,1.964011,0.755781,...,Summer,201.796407,3.547238,Fall,30,1990-01-01,2019-12-31,74,1950-01-01,2023-06-30
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
UAGR0017,0.294427,0.150295,0.087521,1.404136,2.386879,,,,,0.353575,...,Winter,270.637048,5.789120,Summer,10,1978-01-01,1987-12-31,71,1950-01-01,2020-10-31
UAGR0018,0.313707,0.474716,0.266377,2.377497,2.041490,,,,,0.206591,...,Winter,268.321567,5.700364,Summer,10,1978-01-01,1987-12-31,71,1950-01-01,2020-10-30
UAGR0019,0.274963,0.311569,0.194127,1.656232,2.893300,,,,,0.353629,...,Winter,273.598181,5.876036,Summer,10,1978-01-01,1987-12-31,71,1950-01-01,2020-10-31
UAGR0020,0.047618,0.075142,0.055923,2.752072,3.179944,,,,,0.517153,...,Summer,281.568233,6.161831,Summer,10,1978-01-01,1987-12-31,74,1950-01-01,2023-06-30


In [25]:
# Here we organize the data with all the catchments (not only the filtered)
signatures_df = pd.DataFrame(columns = hydrometeo_signatures_df.columns, index = network_estreams.index)
signatures_df.loc[hydrometeo_signatures_df.index, :] =  hydrometeo_signatures_df
signatures_df

Unnamed: 0_level_0,q_corr,q_mean,q_runoff_ratio,q_elas_Sawicz,q_elas_Sankarasubramanian,slope_sawicz,slope_yadav,slope_mcmillan,slope_addor,baseflow_index,...,hp_time,lp_freq,lp_dur,lp_time,num_years_hydro,start_date_hydro,end_date_hydro,num_years_climatic,start_date_climatic,end_date_climatic
basin_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
AT000001,0.172251,2.819267,0.72244,1.113456,1.220884,1.531698,1.247084,1.531698,1.501235,0.760483,...,Summer,198.99349,3.560993,Fall,24,1996-01-01 00:00:00,2019-12-31 00:00:00,74,1950-01-01 00:00:00,2023-06-30 00:00:00
AT000002,0.177957,3.902826,1.003843,0.83022,1.217526,2.488963,1.770777,2.488963,2.44283,0.718639,...,Summer,204.041462,3.576437,Fall,62,1958-10-01 00:00:00,2019-12-31 00:00:00,74,1950-01-01 00:00:00,2023-06-30 00:00:00
AT000003,0.130088,0.914848,0.244704,1.690838,1.978182,1.007297,0.667043,1.007297,0.981304,0.688742,...,Summer,202.000503,3.592064,Fall,35,1985-01-02 00:00:00,2019-12-31 00:00:00,74,1950-01-01 00:00:00,2023-06-30 00:00:00
AT000004,0.167084,5.062633,1.302676,0.769999,0.424427,2.181063,1.677413,2.181063,2.174904,0.746374,...,Summer,205.802835,3.597573,Fall,22,1998-01-02 00:00:00,2019-12-31 00:00:00,74,1950-01-01 00:00:00,2023-06-30 00:00:00
AT000005,0.231827,3.318514,0.805982,1.168462,0.820444,1.966594,1.611727,1.966594,1.964011,0.755781,...,Summer,201.796407,3.547238,Fall,30,1990-01-01 00:00:00,2019-12-31 00:00:00,74,1950-01-01 00:00:00,2023-06-30 00:00:00
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
UAGR0017,0.294427,0.150295,0.087521,1.404136,2.386879,,,,,0.353575,...,Winter,270.637048,5.78912,Summer,10,1978-01-01 00:00:00,1987-12-31 00:00:00,71,1950-01-01 00:00:00,2020-10-31 00:00:00
UAGR0018,0.313707,0.474716,0.266377,2.377497,2.04149,,,,,0.206591,...,Winter,268.321567,5.700364,Summer,10,1978-01-01 00:00:00,1987-12-31 00:00:00,71,1950-01-01 00:00:00,2020-10-30 00:00:00
UAGR0019,0.274963,0.311569,0.194127,1.656232,2.8933,,,,,0.353629,...,Winter,273.598181,5.876036,Summer,10,1978-01-01 00:00:00,1987-12-31 00:00:00,71,1950-01-01 00:00:00,2020-10-31 00:00:00
UAGR0020,0.047618,0.075142,0.055923,2.752072,3.179944,,,,,0.517153,...,Summer,281.568233,6.161831,Summer,10,1978-01-01 00:00:00,1987-12-31 00:00:00,74,1950-01-01 00:00:00,2023-06-30 00:00:00


In [27]:
# Here we filter only the fields used at Addor et al. (2017)
signatures_df = signatures_df[["q_mean", "q_runoff_ratio", "q_elas_Sankarasubramanian", "slope_sawicz",
                               "baseflow_index", "hfd_mean", "hfd_std", "q_5", "q_95", "hq_freq", "hq_dur", "lq_freq", 
                               "lq_dur", "zero_q_freq", "p_mean", "pet_mean", "aridity", 
                               "p_seasonality", "frac_snow", "hp_freq",
                               "hp_dur", "hp_time", "lp_freq", "lp_dur", "lp_time",
                               "num_years_hydro", "start_date_hydro", "end_date_hydro",
                               "num_years_climatic", "start_date_climatic", "end_date_climatic"  
                                                  ]]
signatures_df

Unnamed: 0_level_0,q_mean,q_runoff_ratio,q_elas_Sankarasubramanian,slope_sawicz,baseflow_index,hfd_mean,hfd_std,q_5,q_95,hq_freq,...,hp_time,lp_freq,lp_dur,lp_time,num_years_hydro,start_date_hydro,end_date_hydro,num_years_climatic,start_date_climatic,end_date_climatic
basin_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
AT000001,2.819267,0.72244,1.220884,1.531698,0.760483,238.478261,12.398202,1.023464,6.607176,0.083333,...,Summer,198.99349,3.560993,Fall,24,1996-01-01 00:00:00,2019-12-31 00:00:00,74,1950-01-01 00:00:00,2023-06-30 00:00:00
AT000002,3.902826,1.003843,1.217526,2.488963,0.718639,248.229508,10.832657,0.971384,10.727463,1.044878,...,Summer,204.041462,3.576437,Fall,62,1958-10-01 00:00:00,2019-12-31 00:00:00,74,1950-01-01 00:00:00,2023-06-30 00:00:00
AT000003,0.914848,0.244704,1.978182,1.007297,0.688742,233.411765,27.535231,0.404371,2.803212,6.372301,...,Summer,202.000503,3.592064,Fall,35,1985-01-02 00:00:00,2019-12-31 00:00:00,74,1950-01-01 00:00:00,2023-06-30 00:00:00
AT000004,5.062633,1.302676,0.424427,2.181063,0.746374,243.0,10.109402,1.485925,13.295115,0.045463,...,Summer,205.802835,3.597573,Fall,22,1998-01-02 00:00:00,2019-12-31 00:00:00,74,1950-01-01 00:00:00,2023-06-30 00:00:00
AT000005,3.318514,0.805982,0.820444,1.966594,0.755781,239.206897,14.641866,1.063781,7.692138,0.233344,...,Summer,201.796407,3.547238,Fall,30,1990-01-01 00:00:00,2019-12-31 00:00:00,74,1950-01-01 00:00:00,2023-06-30 00:00:00
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
UAGR0017,0.150295,0.087521,2.386879,,0.353575,189.111111,72.113529,0.0,0.796632,81.511158,...,Winter,270.637048,5.78912,Summer,10,1978-01-01 00:00:00,1987-12-31 00:00:00,71,1950-01-01 00:00:00,2020-10-31 00:00:00
UAGR0018,0.474716,0.266377,2.04149,,0.206591,160.444444,42.924676,0.0,3.63071,60.508283,...,Winter,268.321567,5.700364,Summer,10,1978-01-01 00:00:00,1987-12-31 00:00:00,71,1950-01-01 00:00:00,2020-10-30 00:00:00
UAGR0019,0.311569,0.194127,2.8933,,0.353629,184.666667,59.422218,0.0,1.412163,127.517456,...,Winter,273.598181,5.876036,Summer,10,1978-01-01 00:00:00,1987-12-31 00:00:00,71,1950-01-01 00:00:00,2020-10-31 00:00:00
UAGR0020,0.075142,0.055923,3.179944,,0.517153,188.0,66.869275,0.0,0.236365,167.022864,...,Summer,281.568233,6.161831,Summer,10,1978-01-01 00:00:00,1987-12-31 00:00:00,74,1950-01-01 00:00:00,2023-06-30 00:00:00


In [39]:
signatures_df.iloc[:, 0:-10] = signatures_df.iloc[:, 0:-10].astype(float).round(3)
signatures_df.iloc[:, -9:-7] = signatures_df.iloc[:, -9:-7].astype(float).round(3)
signatures_df

Unnamed: 0_level_0,q_mean,q_runoff_ratio,q_elas_Sankarasubramanian,slope_sawicz,baseflow_index,hfd_mean,hfd_std,q_5,q_95,hq_freq,...,hp_time,lp_freq,lp_dur,lp_time,num_years_hydro,start_date_hydro,end_date_hydro,num_years_climatic,start_date_climatic,end_date_climatic
basin_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
AT000001,2.819,0.722,1.221,1.532,0.76,238.478,12.398,1.023,6.607,0.083,...,Summer,198.993,3.561,Fall,24,1996-01-01 00:00:00,2019-12-31 00:00:00,74,1950-01-01 00:00:00,2023-06-30 00:00:00
AT000002,3.903,1.004,1.218,2.489,0.719,248.23,10.833,0.971,10.727,1.045,...,Summer,204.041,3.576,Fall,62,1958-10-01 00:00:00,2019-12-31 00:00:00,74,1950-01-01 00:00:00,2023-06-30 00:00:00
AT000003,0.915,0.245,1.978,1.007,0.689,233.412,27.535,0.404,2.803,6.372,...,Summer,202.001,3.592,Fall,35,1985-01-02 00:00:00,2019-12-31 00:00:00,74,1950-01-01 00:00:00,2023-06-30 00:00:00
AT000004,5.063,1.303,0.424,2.181,0.746,243.0,10.109,1.486,13.295,0.045,...,Summer,205.803,3.598,Fall,22,1998-01-02 00:00:00,2019-12-31 00:00:00,74,1950-01-01 00:00:00,2023-06-30 00:00:00
AT000005,3.319,0.806,0.82,1.967,0.756,239.207,14.642,1.064,7.692,0.233,...,Summer,201.796,3.547,Fall,30,1990-01-01 00:00:00,2019-12-31 00:00:00,74,1950-01-01 00:00:00,2023-06-30 00:00:00
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
UAGR0017,0.15,0.088,2.387,,0.354,189.111,72.114,0.0,0.797,81.511,...,Winter,270.637,5.789,Summer,10,1978-01-01 00:00:00,1987-12-31 00:00:00,71,1950-01-01 00:00:00,2020-10-31 00:00:00
UAGR0018,0.475,0.266,2.041,,0.207,160.444,42.925,0.0,3.631,60.508,...,Winter,268.322,5.7,Summer,10,1978-01-01 00:00:00,1987-12-31 00:00:00,71,1950-01-01 00:00:00,2020-10-30 00:00:00
UAGR0019,0.312,0.194,2.893,,0.354,184.667,59.422,0.0,1.412,127.517,...,Winter,273.598,5.876,Summer,10,1978-01-01 00:00:00,1987-12-31 00:00:00,71,1950-01-01 00:00:00,2020-10-31 00:00:00
UAGR0020,0.075,0.056,3.18,,0.517,188.0,66.869,0.0,0.236,167.023,...,Summer,281.568,6.162,Summer,10,1978-01-01 00:00:00,1987-12-31 00:00:00,74,1950-01-01 00:00:00,2023-06-30 00:00:00


# Data export

In [40]:
# Export the final dataset:
signatures_df.to_csv(PATH_OUTPUT+"estreams_hydrometeo_signatures.csv")

: 

# End