# Streamflow indices computation

Author: Thiago Nascimento (thiago.nascimento@eawag.ch)

This notebook is part of the EStreams publication and was used to the computation of the miscelaneous of streamflow indexes provided in this publication.

* Note that this code enables not only the replicability of the current database but also the extrapolation to new catchment areas. 
* Additionally, the user should download and insert the original raw-data in the folder of the same name prior to run this code. 
* The original third-party data used were not made available in this repository due to redistribution and storage-space reasons.  

## Requirements
**Python:**

* Python>=3.6
* Jupyter
* geopandas=0.10.2
* numpy
* os
* pandas
* tqdm

Check the Github repository for an environment.yml (for conda environments) or requirements.txt (pip) file.

**Files:**

* data/streamflow/estreams_gauging_stations.csv
* data/streamflow/estreams_timeseries_runoff.csv
* data/shapefiles/estreams_catchments.shp

**Directory:**

* Clone the GitHub directory locally
* Place any third-data variables in their respective directory.
* ONLY update the "PATH" variable in the section "Configurations", with their relative path to the EStreams directory. 

## References
* Do, H. X., Gudmundsson, L., Leonard, M. & Westra, S. The Global Streamflow Indices and Metadata Archive (GSIM)-Part 1: The production of a daily streamflow archive and metadata. Earth Syst Sci Data 10, 765–785 (2018).

* Gudmundsson, L., Do, H. X., Leonard, M. & Westra, S. The Global Streamflow Indices and Metadata Archive (GSIM)-Part 2: Quality control, time-series indices and homogeneity assessment. Earth Syst Sci Data 10, 787–804 (2018).

## Observations
* Here we compute the streamflow indices well-discussed and computed in the GSIM papers (Gudmundsson et al., 2018; Do et al., 2018). 

# Import modules

In [1]:
import pandas as pd
import numpy as np
import tqdm as tqdm
import geopandas as gpd
import os
from utils.streamflowindices import *
from utils.general import calculate_areas_when_0, calculate_specific_discharge


# Configurations

In [2]:
# Only editable variables:
# Relative path to your local directory
PATH = "../../.."
# Set some constraints for minimum number of daily measurments at each time-resolution:
THRESHOLD_YR = 360 
THRESHOLD_MO = 25
THRESHOLD_WE = 7
THRESHOLD_SE = 80

* #### The users should NOT change anything in the code below here. 

In [3]:
# Non-editable variables:
PATH_YR = "results/timeseries/streamflowindices/yearly/"
PATH_MO = "results/timeseries/streamflowindices/monthly/"
PATH_WE = "results/timeseries/streamflowindices/weekly/"
PATH_SE = "results/timeseries/streamflowindices/seasonally/"

# Set the directory:
os.chdir(PATH)

# Import data

## Daily specific discharge time series
It is important to note that this time series was already filtered under a quality-check to delete negative values. The data is in milimeters per day (mm/day).  

In [4]:
timeseries_EU = pd.read_csv("data/streamflow/estreams_timeseries_streamflow.csv", index_col=0)
timeseries_EU.index = pd.to_datetime(timeseries_EU.index)
timeseries_EU.index.name = ""

## Streamflow gauges network

In [5]:
network_estreams = pd.read_csv('data/streamflow/estreams_gauging_stations.csv', encoding='utf-8')
network_estreams.set_index("basin_id", inplace = True)
network_estreams


Unnamed: 0_level_0,gauge_id,gauge_name,gauge_country,gauge_provider,river,lon_snap,lat_snap,lon,lat,area,...,start_date,end_date,num_years,num_months,num_days,num_days_gaps,num_continuous_days,duplicated_suspect,watershed_group,gauges_upstream
basin_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
AT000001,200014,Bangs,AT,AT_EHYD,Rhein,9.534835,47.273748,9.534835,47.273748,4647.9,...,1996-01-01,2019-12-31,24,288,8766,0.0,8766,CH000197,1,13
AT000002,200048,Schruns (Vonbunweg),AT,AT_EHYD,Litz,9.913677,47.080301,9.913677,47.080301,102.0,...,1958-10-01,2019-12-31,62,735,22372,0.0,22372,CH000221,1,0
AT000003,231662,Loruens-Aeule,AT,AT_EHYD,Ill,9.847765,47.132821,9.847765,47.132821,535.2,...,1985-01-02,2019-12-31,35,420,12782,0.0,12782,CH000215,1,1
AT000004,200592,Kloesterle (OEBB),AT,AT_EHYD,Alfenz,10.061843,47.128994,10.061843,47.128994,66.6,...,1998-01-02,2019-12-31,22,264,8034,0.0,8034,CH000227,1,0
AT000005,200097,Buers (Bruecke L82),AT,AT_EHYD,Alvier,9.802668,47.150770,9.802668,47.150770,72.2,...,1990-01-01,2019-12-31,30,360,10957,0.0,10957,CH000214,1,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
UAGR0017,6682300,BASHTANOVKA,UA,UA_GRDC,KACHA,33.894739,44.691884,33.900000,44.683333,321.0,...,1978-01-01,1987-12-31,10,120,3652,0.0,3652,,1916,0
UAGR0018,6682500,YALTA,UA,UA_GRDC,DERE-KIOY,34.166667,44.500000,34.166667,44.500000,49.7,...,1978-01-01,1987-12-31,10,120,3652,0.0,3652,,1917,0
UAGR0019,6683010,PIONERSKOE,UA,UA_GRDC,SALHYR,34.199841,44.887685,34.200000,44.883333,261.0,...,1978-01-01,1987-12-31,10,120,3652,0.0,3652,,1918,0
UAGR0020,6683200,TOKMAK,UA,UA_GRDC,TOKMAK,35.705833,47.251389,35.705833,47.251389,760.0,...,1978-01-01,1987-12-31,10,120,3652,0.0,3652,,1919,0


## Catchment boundaries

In [6]:
catchment_boundaries = gpd.read_file('data/shapefiles/estreams_catchments.shp')
catchment_boundaries.set_index("basin_id", inplace = True)
catchment_boundaries.head()

Unnamed: 0_level_0,gauge_id,gauge_coun,area,area_calc,area_flag,area_perc,start_date,end_date,gauge_hier,watershed_,geometry
basin_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
AT000001,200014,AT,4647.9,4668.379,0,-0.440608,1996-01-01,2019-12-31,14,1,"POLYGON Z ((9.69406 46.54322 0.00000, 9.69570 ..."
AT000002,200048,AT,102.0,102.287,0,-0.281373,1958-10-01,2019-12-31,1,1,"POLYGON Z ((10.13650 47.02949 0.00000, 10.1349..."
AT000003,231662,AT,535.2,536.299,0,-0.205344,1985-01-02,2019-12-31,2,1,"POLYGON Z ((10.11095 46.89437 0.00000, 10.1122..."
AT000004,200592,AT,66.6,66.286,0,0.471471,1998-01-02,2019-12-31,1,1,"POLYGON Z ((10.14189 47.09706 0.00000, 10.1404..."
AT000005,200097,AT,72.2,72.448,0,-0.34349,1990-01-01,2019-12-31,1,1,"POLYGON Z ((9.67851 47.06249 0.00000, 9.67888 ..."


# Computation processing
## Pre-processing:

### Specific discharge computation

In [7]:
# Adjust areas that might be zero: 
network_estreams = calculate_areas_when_0(network_estreams, catchment_boundaries)
network_estreams

Unnamed: 0_level_0,gauge_id,gauge_name,gauge_country,gauge_provider,river,lon_snap,lat_snap,lon,lat,area,...,start_date,end_date,num_years,num_months,num_days,num_days_gaps,num_continuous_days,duplicated_suspect,watershed_group,gauges_upstream
basin_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
AT000001,200014,Bangs,AT,AT_EHYD,Rhein,9.534835,47.273748,9.534835,47.273748,4647.9,...,1996-01-01,2019-12-31,24,288,8766,0.0,8766,CH000197,1,13
AT000002,200048,Schruns (Vonbunweg),AT,AT_EHYD,Litz,9.913677,47.080301,9.913677,47.080301,102.0,...,1958-10-01,2019-12-31,62,735,22372,0.0,22372,CH000221,1,0
AT000003,231662,Loruens-Aeule,AT,AT_EHYD,Ill,9.847765,47.132821,9.847765,47.132821,535.2,...,1985-01-02,2019-12-31,35,420,12782,0.0,12782,CH000215,1,1
AT000004,200592,Kloesterle (OEBB),AT,AT_EHYD,Alfenz,10.061843,47.128994,10.061843,47.128994,66.6,...,1998-01-02,2019-12-31,22,264,8034,0.0,8034,CH000227,1,0
AT000005,200097,Buers (Bruecke L82),AT,AT_EHYD,Alvier,9.802668,47.150770,9.802668,47.150770,72.2,...,1990-01-01,2019-12-31,30,360,10957,0.0,10957,CH000214,1,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
UAGR0017,6682300,BASHTANOVKA,UA,UA_GRDC,KACHA,33.894739,44.691884,33.900000,44.683333,321.0,...,1978-01-01,1987-12-31,10,120,3652,0.0,3652,,1916,0
UAGR0018,6682500,YALTA,UA,UA_GRDC,DERE-KIOY,34.166667,44.500000,34.166667,44.500000,49.7,...,1978-01-01,1987-12-31,10,120,3652,0.0,3652,,1917,0
UAGR0019,6683010,PIONERSKOE,UA,UA_GRDC,SALHYR,34.199841,44.887685,34.200000,44.883333,261.0,...,1978-01-01,1987-12-31,10,120,3652,0.0,3652,,1918,0
UAGR0020,6683200,TOKMAK,UA,UA_GRDC,TOKMAK,35.705833,47.251389,35.705833,47.251389,760.0,...,1978-01-01,1987-12-31,10,120,3652,0.0,3652,,1919,0


In [8]:
# Specific discharge computation
timeseries_EU = calculate_specific_discharge(network_estreams, timeseries_EU)


## Other preprocessing

In [9]:
# We need to compute a mask with the data quality: 1 refers to NaNs and 0 to non-NaNs
timeseries_EU_quality = (pd.isna(timeseries_EU)).astype(int)

In [10]:
# Here we use the calendar year, so the hydro_year starts in 1 January
hydro_year = np.array(timeseries_EU_quality.index.year)

In [11]:
# Apply a 7-day backward-looking moving average
timeseries_EU_smoothed = timeseries_EU.rolling(window=7, min_periods=1).mean()

## Yearly

In [None]:
# Here all the operations are organized in the form of a "pipeline" for organization and progress visuatilization:

# List of operations with names, functions, and simplified names for file export:
operations = [
    ('Mean', lambda x: np.mean(x) if x.count() >= THRESHOLD_YR else np.nan, 'mean'),
    ('Standard Deviation', lambda x: np.std(x) if x.count() >= THRESHOLD_YR else np.nan, 'std'),
    ('Coefficient of Variation', lambda x: np.var(x) if x.count() >= THRESHOLD_YR else np.nan, 'cv'),
    ('Minimum', lambda x: np.min(x) if x.count() >= THRESHOLD_YR else np.nan, 'min'),
    ('Maximum', lambda x: np.max(x) if x.count() >= THRESHOLD_YR else np.nan, 'max'),
    ('IQR', lambda x: calculate_iqr(x, THRESHOLD_YR), 'iqr'),
    ('Percentile 10', lambda x: np.percentile(x, 10) if x.count() >= THRESHOLD_YR else np.nan, 'p10'),
    ('Percentile 20', lambda x: np.percentile(x, 20) if x.count() >= THRESHOLD_YR else np.nan, 'p20'),
    ('Percentile 30', lambda x: np.percentile(x, 30) if x.count() >= THRESHOLD_YR else np.nan, 'p30'),
    ('Percentile 40', lambda x: np.percentile(x, 40) if x.count() >= THRESHOLD_YR else np.nan, 'p40'),
    ('Percentile 50', lambda x: np.percentile(x, 50) if x.count() >= THRESHOLD_YR else np.nan, 'p50'),
    ('Percentile 60', lambda x: np.percentile(x, 60) if x.count() >= THRESHOLD_YR else np.nan, 'p60'),
    ('Percentile 70', lambda x: np.percentile(x, 70) if x.count() >= THRESHOLD_YR else np.nan, 'p70'),
    ('Percentile 80', lambda x: np.percentile(x, 80) if x.count() >= THRESHOLD_YR else np.nan, 'p80'),
    ('Percentile 90', lambda x: np.percentile(x, 90) if x.count() >= THRESHOLD_YR else np.nan, 'p90'),
    ('Minimum 7-days', lambda x: np.min(x) if x.count() >= THRESHOLD_YR else np.nan, 'min7days'),
    ('Maximum 7-days', lambda x: np.max(x) if x.count() >= THRESHOLD_YR else np.nan, 'max7days'),
    ('Center Timing', None, 'ct'),
    ('DOY minimum', None, 'doymin'), 
    ('DOY maximum', None, 'doymax'),
    ('DOY minimum 7-days', None, 'doy7min'), 
    ('DOY maximum 7-days', None, 'doy7max'), 
    ('Gini coefficient', None, 'gini')
]

# Dictionary to store the results of each operation
results = {}

# Iterate over operations to calculate metrics and export to CSV
for op_name, op_func, op_file in tqdm.tqdm(operations, desc='Calculating Metrics'):
    
    if op_name == 'Minimum 7-days':
        
        # Calculate the metric using the resample method
        results[op_name] = timeseries_EU_smoothed.resample('Y').agg(op_func)
    
    elif op_name == 'Maximum 7-days':
        
        # Calculate the metric using the resample method
        results[op_name] = timeseries_EU_smoothed.resample('Y').agg(op_func)
    
    elif op_name == 'Center Timing':
        
        # Initialize an empty DataFrame to store count threshold results
        center_timing_result = pd.DataFrame(index = timeseries_EU.index.year.unique(), 
                                            columns = timeseries_EU.columns)

        # Iterate over gauges and calculate count thresholds
        for gauge in timeseries_EU.columns:
            center_timing_result[gauge] = calculate_ct(
                streamflow=timeseries_EU.loc[:, gauge].values,
                quality=timeseries_EU_quality.loc[:, gauge].values,
                hydro_year=hydro_year
            )
        
        results[op_name] = center_timing_result
        
    elif op_name == 'DOY minimum':
        
        # Initialize an empty DataFrame to store count threshold results
        doy_min_result = pd.DataFrame(index = timeseries_EU.index.year.unique(), 
                                      columns = timeseries_EU.columns)

        # Iterate over gauges and calculate count thresholds
        for gauge in timeseries_EU.columns:
            doy_min_result[gauge] = calculate_min_streamflow_day(
                streamflow=timeseries_EU.loc[:, gauge].values,
                quality=timeseries_EU_quality.loc[:, gauge].values,
                hydro_year=hydro_year
            )
        
        results[op_name] = doy_min_result
        
    elif op_name == 'DOY maximum':
        
        # Initialize an empty DataFrame to store count threshold results
        doy_max_result = pd.DataFrame(index = timeseries_EU.index.year.unique(), 
                                      columns = timeseries_EU.columns)

        # Iterate over gauges and calculate count thresholds
        for gauge in timeseries_EU.columns:
            doy_max_result[gauge] = calculate_max_streamflow_day(
                streamflow=timeseries_EU.loc[:, gauge].values,
                quality=timeseries_EU_quality.loc[:, gauge].values,
                hydro_year=hydro_year
            )
        
        results[op_name] = doy_max_result
        
    elif op_name == 'DOY minimum 7-days':
        
        # Initialize an empty DataFrame to store count threshold results
        doy_7min_result = pd.DataFrame(index = timeseries_EU_smoothed.index.year.unique(), 
                                      columns = timeseries_EU_smoothed.columns)

        # Iterate over gauges and calculate count thresholds
        for gauge in timeseries_EU_smoothed.columns:
            doy_7min_result[gauge] = calculate_min_streamflow_day(
                streamflow=timeseries_EU_smoothed.loc[:, gauge].values,
                quality=timeseries_EU_quality.loc[:, gauge].values,
                hydro_year=hydro_year
            )
        
        results[op_name] = doy_7min_result
        
    elif op_name == 'DOY maximum 7-days':
        
        # Initialize an empty DataFrame to store count threshold results
        doy_7max_result = pd.DataFrame(index = timeseries_EU_smoothed.index.year.unique(), 
                                      columns = timeseries_EU_smoothed.columns)

        # Iterate over gauges and calculate count thresholds
        for gauge in timeseries_EU_smoothed.columns:
            doy_7max_result[gauge] = calculate_max_streamflow_day(
                streamflow=timeseries_EU_smoothed.loc[:, gauge].values,
                quality=timeseries_EU_quality.loc[:, gauge].values,
                hydro_year=hydro_year
            )
        
        results[op_name] = doy_7max_result
    
    elif op_name == 'Gini coefficient':
        
        # Initialize an empty DataFrame to store count threshold results
        gini_result = pd.DataFrame(index = timeseries_EU.index.year.unique(), 
                                   columns = timeseries_EU.columns)

        # Iterate over gauges and calculate count thresholds
        for gauge in timeseries_EU.columns:
            gini_result[gauge] = calculate_gini_coefficient(
                streamflow=timeseries_EU.loc[:, gauge].values,
                quality=timeseries_EU_quality.loc[:, gauge].values,
                hydro_year=hydro_year
            )
        
        results[op_name] = gini_result
        
    else:
        # Calculate the metric using the resample method
        results[op_name] = timeseries_EU.resample('Y').agg(op_func)
        
    
    # Access results as needed
    current_result = results[op_name]

    # Export to CSV-file
    file_name = f"yearly_streamflow_{op_file}.csv"
    file_path = os.path.join(PATH_YR, file_name)
    current_result = current_result.sort_index(axis=1) # Here we sort the columns
    current_result.to_csv(file_path)

# Access results as needed
streamflow_yr_ave = results['Mean']
streamflow_yr_std = results['Standard Deviation']
streamflow_yr_cv = results['Coefficient of Variation']
streamflow_yr_min = results['Minimum']
streamflow_yr_max = results['Maximum']
streamflow_yr_iqr = results['IQR']
streamflow_yr_p10 = results['Percentile 10']
streamflow_yr_p20 = results['Percentile 20']
streamflow_yr_p30 = results['Percentile 30']
streamflow_yr_p40 = results['Percentile 40']
streamflow_yr_p50 = results['Percentile 50']
streamflow_yr_p60 = results['Percentile 60']
streamflow_yr_p70 = results['Percentile 70']
streamflow_yr_p80 = results['Percentile 80']
streamflow_yr_p90 = results['Percentile 90']
streamflow_yr_ct = results['Center Timing']
streamflow_yr_doymin = results['DOY minimum']
streamflow_yr_doymax = results['DOY maximum']
streamflow_yr_doy7min = results['DOY minimum 7-days']
streamflow_yr_doy7max = results['DOY maximum 7-days']
streamflow_yr_gini = results['Gini coefficient']

## Monthly

In [None]:
# Here all the operations are organized in the form of a "pipeline" for organization and progress visuatilization:

# List of operations with names, functions, and simplified names for file export:
operations = [
    ('Mean', lambda x: np.mean(x) if x.count() >= THRESHOLD_MO else np.nan, 'mean'),
    ('Standard Deviation', lambda x: np.std(x) if x.count() >= THRESHOLD_MO else np.nan, 'std'),
    ('Coefficient of Variation', lambda x: np.var(x) if x.count() >= THRESHOLD_MO else np.nan, 'cv'),
    ('Minimum', lambda x: np.min(x) if x.count() >= THRESHOLD_MO else np.nan, 'min'),
    ('Maximum', lambda x: np.max(x) if x.count() >= THRESHOLD_MO else np.nan, 'max'),
    ('Minimum 7-days', lambda x: np.min(x) if x.count() >= THRESHOLD_MO else np.nan, 'min7days'),
    ('Maximum 7-days', lambda x: np.max(x) if x.count() >= THRESHOLD_MO else np.nan, 'max7days'),
    ('IQR', lambda x: calculate_iqr(x, THRESHOLD_MO), 'iqr')
]

# Dictionary to store the results of each operation
results = {}

# Iterate over operations to calculate metrics and export to CSV
for op_name, op_func, op_file in tqdm.tqdm(operations, desc='Calculating Metrics'):
    
    if op_name == 'Minimum 7-days':
        
        # Calculate the metric using the resample method
        results[op_name] = timeseries_EU_smoothed.resample('M').agg(op_func)
    
    elif op_name == 'Maximum 7-days':
        
        # Calculate the metric using the resample method
        results[op_name] = timeseries_EU_smoothed.resample('M').agg(op_func)  
    
    else:
            
        # Calculate the metric using the resample method
        results[op_name] = timeseries_EU.resample('M').agg(op_func)
    
    # Access results as needed
    current_result = results[op_name]

    # Export to CSV-file
    file_name = f"monthly_streamflow_{op_file}.csv"
    file_path = os.path.join(PATH_MO, file_name)
    current_result = current_result.sort_index(axis=1) # Here we sort the columns
    current_result.to_csv(file_path)
    
# Access results as needed
streamflow_mo_ave = results['Mean']
streamflow_mo_std = results['Standard Deviation']
streamflow_mo_cv = results['Coefficient of Variation']
streamflow_mo_min = results['Minimum']
streamflow_mo_max = results['Maximum']
streamflow_mo_min7days = results['Minimum 7-days']
streamflow_mo_max7days = results['Maximum 7-days']
streamflow_mo_iqr = results['IQR']

## Seasonal

In [None]:
# Here all the operations are organized in the form of a "pipeline" for organization and progress visuatilization:

# List of operations with names, functions, and simplified names for file export:
operations = [
    ('Mean', lambda x: np.mean(x) if x.count() >= THRESHOLD_SE else np.nan, 'mean'),
    ('Standard Deviation', lambda x: np.std(x) if x.count() >= THRESHOLD_SE else np.nan, 'std'),
    ('Coefficient of Variation', lambda x: np.var(x) if x.count() >= THRESHOLD_SE else np.nan, 'cv'),
    ('Minimum', lambda x: np.min(x) if x.count() >= THRESHOLD_SE else np.nan, 'min'),
    ('Maximum', lambda x: np.max(x) if x.count() >= THRESHOLD_SE else np.nan, 'max'),
    ('IQR', lambda x: calculate_iqr(x, THRESHOLD_SE), 'iqr'),
    ('Percentile 10', lambda x: np.percentile(x, 10) if x.count() >= THRESHOLD_SE else np.nan, 'p10'),
    ('Percentile 20', lambda x: np.percentile(x, 20) if x.count() >= THRESHOLD_SE else np.nan, 'p20'),
    ('Percentile 30', lambda x: np.percentile(x, 30) if x.count() >= THRESHOLD_SE else np.nan, 'p30'),
    ('Percentile 40', lambda x: np.percentile(x, 40) if x.count() >= THRESHOLD_SE else np.nan, 'p40'),
    ('Percentile 50', lambda x: np.percentile(x, 50) if x.count() >= THRESHOLD_SE else np.nan, 'p50'),
    ('Percentile 60', lambda x: np.percentile(x, 60) if x.count() >= THRESHOLD_SE else np.nan, 'p60'),
    ('Percentile 70', lambda x: np.percentile(x, 70) if x.count() >= THRESHOLD_SE else np.nan, 'p70'),
    ('Percentile 80', lambda x: np.percentile(x, 80) if x.count() >= THRESHOLD_SE else np.nan, 'p80'),
    ('Percentile 90', lambda x: np.percentile(x, 90) if x.count() >= THRESHOLD_SE else np.nan, 'p90'),
    ('Minimum 7-days', lambda x: np.min(x) if x.count() >= THRESHOLD_SE else np.nan, 'min7days'),
    ('Maximum 7-days', lambda x: np.max(x) if x.count() >= THRESHOLD_SE else np.nan, 'max7days')]

# Dictionary to store the results of each operation
results = {}

# Iterate over operations to calculate metrics and export to CSV
for op_name, op_func, op_file in tqdm.tqdm(operations, desc='Calculating Metrics'):
    
    if op_name == 'Minimum 7-days':
        
        # Calculate the metric using the resample method
        results[op_name] = timeseries_EU_smoothed.resample('QS-MAR').agg(op_func)
    
    elif op_name == 'Maximum 7-days':
        
        # Calculate the metric using the resample method
        results[op_name] = timeseries_EU_smoothed.resample('QS-MAR').agg(op_func)
        
    else:
        # Calculate the metric using the resample method
        results[op_name] = timeseries_EU.resample('QS-MAR').agg(op_func)
        
    
    # Access results as needed
    current_result = results[op_name]
    
    # Delete the first row (1899)
    current_result = current_result.iloc[1:, :]
    
    # Export to CSV-file
    file_name = f"seasonally_streamflow_{op_file}.csv"
    file_path = os.path.join(PATH_SE, file_name)
    current_result = current_result.sort_index(axis=1) # Here we sort the columns
    current_result.to_csv(file_path)

# Access results as needed
streamflow_se_ave = results['Mean']
streamflow_se_std = results['Standard Deviation']
streamflow_se_cv = results['Coefficient of Variation']
streamflow_se_min = results['Minimum']
streamflow_se_max = results['Maximum']
streamflow_se_iqr = results['IQR']
streamflow_se_p10 = results['Percentile 10']
streamflow_se_p20 = results['Percentile 20']
streamflow_se_p30 = results['Percentile 30']
streamflow_se_p40 = results['Percentile 40']
streamflow_se_p50 = results['Percentile 50']
streamflow_se_p60 = results['Percentile 60']
streamflow_se_p70 = results['Percentile 70']
streamflow_se_p80 = results['Percentile 80']
streamflow_se_p90 = results['Percentile 90']

## Weekly

In [12]:
# Here all the operations are organized in the form of a "pipeline" for organization and progress visuatilization:

# List of operations with names, functions, and simplified names for file export:
operations = [
    ('Mean', lambda x: np.mean(x) if x.count() >= THRESHOLD_WE else np.nan, 'mean'),
    ('Standard Deviation', lambda x: np.std(x) if x.count() >= THRESHOLD_WE else np.nan, 'std'),
    ('Coefficient of Variation', lambda x: np.var(x) if x.count() >= THRESHOLD_WE else np.nan, 'cv'),
    ('Minimum', lambda x: np.min(x) if x.count() >= THRESHOLD_WE else np.nan, 'min'),
    ('Maximum', lambda x: np.max(x) if x.count() >= THRESHOLD_WE else np.nan, 'max'),
    ('IQR', lambda x: calculate_iqr(x, THRESHOLD_WE), 'iqr')
]

# Dictionary to store the results of each operation
results = {}

# Iterate over operations to calculate metrics and export to CSV
for op_name, op_func, op_file in tqdm.tqdm(operations, desc='Calculating Metrics'):

    # Calculate the metric using the resample method
    results[op_name] = timeseries_EU.resample('W').agg(op_func)
    
    # Access results as needed
    current_result = results[op_name]

    # Export to CSV-file
    file_name = f"weekly_streamflow_{op_file}.csv"
    file_path = os.path.join(PATH_WE, file_name)
    current_result = current_result.sort_index(axis=1) # Here we sort the columns
    current_result.to_csv(file_path)

# Access results as needed
streamflow_we_ave = results['Mean']
streamflow_we_std = results['Standard Deviation']
streamflow_we_cv = results['Coefficient of Variation']
streamflow_we_min = results['Minimum']
streamflow_we_max = results['Maximum']
streamflow_we_iqr = results['IQR']

Calculating Metrics:  83%|████████▎ | 5/6 [3:54:23<46:52, 2812.69s/it]  


KeyboardInterrupt: 

# End