# Hydro-meteorological signatures computation

Author: Thiago Nascimento (thiago.nascimento@eawag.ch)

This notebook is part of the EStreams publication and was used to the computation of the miscelaneous of hydro-meteorological signatures provided in this publication.

* Note that this code enables not only the replicability of the current database but also the extrapolation to new catchment areas. 
* Additionally, the user should download and insert the original raw-data in the folder of the same name prior to run this code. 
* The original third-party data used were not made available in this repository due to redistribution and storage-space reasons.  

## Requirements
**Python:**

* Python>=3.6
* Jupyter
* geopandas=0.10.2
* hydroanalysis https://pypi.org/project/hydroanalysis/ (Last access: 30 December 2023)
* numpy
* os
* pandas=2.1.3
* scipy=1.9.0
* tqdm

Check the Github repository for an environment.yml (for conda environments) or requirements.txt (pip) file.

**Files:**

* data/streamflow/estreams_timeseries_discharge.csv
* results/timeseries/meteorology/estreams_timeseries_precipitation.csv
* results/timeseries/meteorology/estreams_timeseries_temperature.csv
* results/timeseries/meteorology/estreams_timeseries_pet.csv
* data/streamflow/estreams_gauging_stations.csv
* data/shapefiles/estreams_catchments.shp

**Directory:**

* Clone the GitHub directory locally
* Place any third-data variables in their respective directory.
* ONLY update the "PATH" variable in the section "Configurations", with their relative path to the EStreams directory. 


## References
* Addor, N., Newman, A. J., Mizukami, N., and Clark, M. P.: The CAMELS data set: catchment attributes and meteorology for large-sample studies, Hydrol. Earth Syst. Sci., 21, 5293-5313, https://doi.org/10.5194/hess-21-5293-2017, 2017.
* https://github.com/naddor/camels/blob/master/clim/clim_indices.R

## Observations
* Here we compute the hydro-climatic signatures well-discussed and computed in the Camels-like publications and avaialable at the HydroAnalysis module, which is based on Addor et al. (2017).

# Import modules

In [None]:
import pandas as pd
import geopandas as gpd
import numpy as np
import tqdm as tqdm
import os
from utils.streamflowindices import calculate_hydro_year
from utils.general import count_num_measurements, find_first_non_nan_dates, find_last_non_nan_dates, calculate_areas_when_0, calculate_specific_discharge
import warnings
import hydroanalysis #Make sure to have this module locally installed

# Configurations

In [None]:
# Only editable variables:
# Relative path to your local directory
PATH = "../../.."
# Suppress all warnings
warnings.filterwarnings("ignore")

* #### The users should NOT change anything in the code below here. 

In [None]:
# Non-editable variables:
PATH_OUTPUT = "results/staticattributes/"

# Set the directory:
os.chdir(PATH)

# Import data

## Daily discharge
It is important to note that this time series was already filtered under a quality-check. 

In [None]:
timeseries_discharge = pd.read_csv("data/streamflow/estreams_timeseries_streamflow.csv", index_col=0)
timeseries_discharge.index = pd.to_datetime(timeseries_discharge.index)
timeseries_discharge.index.name = "date"

## Precipitation

In [None]:
timeseries_precipitation = pd.read_csv("results/timeseries/meteorology/estreams_meteorology_precipitation.csv", index_col=0)
timeseries_precipitation.index = pd.to_datetime(timeseries_precipitation.index)
timeseries_precipitation.index.name = "date"
timeseries_precipitation

## PET

In [None]:
timeseries_pet = pd.read_csv("results/timeseries/meteorology/estreams_meteorology_pet.csv", index_col=0)
timeseries_pet.index = pd.to_datetime(timeseries_pet.index)
timeseries_pet.index.name = "date"
timeseries_pet

## Temperature

In [None]:
timeseries_temperature = pd.read_csv("results/timeseries/meteorology/estreams_meteorology_temperature.csv", index_col=0)
timeseries_temperature.index = pd.to_datetime(timeseries_temperature.index)
timeseries_temperature.index.name = "date"
timeseries_temperature

## Streamflow gauges network

In [None]:
network_estreams = pd.read_csv('data/streamflow/estreams_gauging_stations.csv', encoding='utf-8')
network_estreams.set_index("basin_id", inplace = True)
network_estreams

## Catchment boundaries

In [None]:
catchment_boundaries = gpd.read_file('data/shapefiles/estreams_catchments.shp')
catchment_boundaries.set_index("basin_id", inplace = True)
catchment_boundaries.head()

In [None]:
print("The total number of catchments to be processed are:", len(catchment_boundaries))

# Computation processing


## Computation of time specific discharge

In [None]:
# Adjust areas that might be zero: 
network_estreams = calculate_areas_when_0(network_estreams, catchment_boundaries)
network_estreams

In [None]:
# Specific discharge computation
timeseries_discharge = calculate_specific_discharge(network_estreams, timeseries_discharge)

## Filtering the data

In [None]:
# Here we can filter only the gauges with non "999" or "888" values:
network_estreams_filtered = network_estreams[network_estreams.area_flag < 888.0]
network_estreams_filtered

In [None]:
# Here we can filter only the gauges at least 1 year of consecutive measured days:
network_estreams_filtered = network_estreams_filtered[network_estreams_filtered.num_continuous_days >= 365]
network_estreams_filtered

## Preprocessing

### Filter the time series
* At this part we can filter the hydro-climatic time series to the filtered gauges.

In [None]:
# Specific discharge
timeseries_discharge = timeseries_discharge.loc[:, network_estreams_filtered.index]

# Precipitation
timeseries_precipitation = timeseries_precipitation.loc[:, network_estreams_filtered.index]

# Temperature
timeseries_temperature = timeseries_temperature.loc[:, network_estreams_filtered.index]

# Potential evapotranspiration
timeseries_pet = timeseries_pet.loc[:, network_estreams_filtered.index]

### Subset the streamflow time series
* Here we subset the streamflow time series to the same time-period of the meteorology 

In [None]:
timeseries_discharge = timeseries_discharge.loc[timeseries_precipitation.index,:]

### Compute quality masks
* We need to compute a mask which will assin "1" to NaNs and "0" to good quality data

In [None]:
# Quality-mask for joint specific discharge and precipitation (Hydrological signatures):  
quality_discharge_precipitation = (pd.isna(timeseries_discharge) | pd.isna(timeseries_precipitation)).astype(int)

# Quality-mask for joint precipitation, pet and temperature (Climatic signatures):
quality_pet_precipitation_temperature = (pd.isna(timeseries_precipitation) | pd.isna(timeseries_pet) | pd.isna(timeseries_temperature)).astype(int)

### Calculate the hydrological years

In [None]:
hydro_year = calculate_hydro_year(date=timeseries_discharge.index, first_month=10)
hydro_year

### Adjust the precipitation elasticity function

The function presents a small mistake in the hydroanalysis module and therefore needs to be updated.

- Here we have a small turn around, where we corrected the function locally, replaced in the module, and applied here.

- We have made a edition request to the GitHub page of the "hydroanalysis" creator, and as soon as they accept, we can simply import the updated function.

In [None]:
# The code below is only an small update of the original function from the module "hydroanalysis"
# -----------------------------------------------------------------------------------------------
import numpy as np
import warnings
import inspect
from scipy import stats
from scipy.optimize import least_squares

def check_data(**kwargs):
    """
    This function checks if all the input arguments:
    - are 1D np.ndarray
    - have the same shape
    - there are data points with good quality code (0)
    """

    for k in kwargs:
        # Check if array
        if not isinstance(kwargs[k], np.ndarray):
            raise TypeError('{} is of type {}'.format(k, type(kwargs[k])))
        # Check if shape
        if len(kwargs[k].shape) != 1:
            raise ValueError(
                '{} must be 1D. Shape :  {}'.format(k, kwargs[k].shape))

    for k1 in kwargs:
        for k2 in kwargs:
            if k1 == k2:
                continue
            if kwargs[k1].shape != kwargs[k2].shape:
                raise ValueError('{} and {} have different shape: {}, {}'.format(k1,
                                                                                 k2,
                                                                                 kwargs[k1].shape,
                                                                                 kwargs[k2].shape))

    # Check if at least some data have good quality
    good_quality_data = True

    if 'quality' in kwargs:
        if sum(kwargs['quality'] == 0) < 1:
            good_quality_data = False
            warnings.warn('Skipped because of no data')

    return good_quality_data


def calculate_p_seasonality(precipitation, quality, date, temperature):
    """
    This function calculates the signature "p_seasonality".

    Parameters
    ----------
    precipitation : np.array
        Array of precipitation measurements. It is assumed that it represent
        daily data.
    quality : np.array
        Array containing the quality code for the precipitation measurements. It
        is assumed that it is concomitant to the precipitation time series. Data
        with good quality is "0", data with bad quality is "1"
    date : pandas.core.indexes.datetimes.DatetimeIndex
        Date series. It is assumed that it is concomitant to the precipitation
        time series
    temperature : np.array
        Array containing the temperature time series. It is assumed that it is
        concomitant to the precipitation time series.

    Returns
    -------
    float
        Value of the signature.
    """

    good_quality_data = check_data(precipitation=precipitation,
                                   quality=quality,
                                   temperature=temperature)

    if not good_quality_data:
        return None

    if len(precipitation) != len(date):
        raise ValueError('precipitation and date have different length: {} vs {}'.format(
            len(precipitation), len(date)))

    # Calculate the day of the year
    t_julian = date.dayofyear-1  # 1 Jan is zero

    # Create a DataFrame
    prec = pd.DataFrame(data=np.array([precipitation, quality]).transpose(),
                        index=date,
                        columns=['P', 'QC'])

    mean_month_prec = prec[prec.QC == 0].groupby(
        prec.index[prec.QC == 0].month).mean()

    # Get a first guess of the phase -> month with the most precipitation
    sp_first_guess = 90 - (mean_month_prec.idxmax()['P'] - 1)*30
    sp_first_guess = sp_first_guess + 360 if sp_first_guess < 0 else sp_first_guess

    # Fit the two sine functions
    def fit_p(pars, x, y):
        prec = y.mean() * (1 + pars[0]*np.sin(2*np.pi*(x-pars[1])/365.25))
        return y - prec

    def fit_t(pars, x, y):
        temp = y.mean() + pars[0]*np.sin(2*np.pi*(x-pars[1])/365.25)
        return y - temp

    prec_pars = least_squares(fun=fit_p,
                              x0=[0.4, sp_first_guess],
                              bounds=([-1, 0], [1, 365.25]),
                              args=(t_julian[quality == 0], precipitation[quality == 0]))

    temp_pars = least_squares(fun=fit_t,
                              x0=[5, 270],
                              args=(t_julian[quality == 0], temperature[quality == 0]))

    # Explicit the parameters
    delta_p = prec_pars.x[0]
    sp = prec_pars.x[1]
    delta_t = temp_pars.x[0]
    st = temp_pars.x[1]

    sig = delta_p * np.sign(delta_t) * np.cos(2 * np.pi * (sp - st) / 365.25)

    return(float(sig))


# -----------------------------------------------------------------------------------------------

# Finally, we replace the function with the corrected version: 
hydroanalysis.meteo_indexes.calculate_p_seasonality = calculate_p_seasonality

## Signatures computation

In [None]:
hydrometeo_signatures_df = pd.DataFrame(index = network_estreams_filtered.index, 
                                        columns = ["q_corr", "q_mean", "q_runoff_ratio", "q_elas_Sawicz", 
                                                   "q_elas_Sankarasubramanian", "slope_sawicz", "slope_yadav",
                                                   "slope_mcmillan", "slope_addor", "baseflow_index", "hfd_mean",
                                                   "hfd_std", "q_5", "q_95", "hq_freq", "hq_dur", "lq_freq", 
                                                   "lq_dur", "zero_q_freq", "p_mean", "pet_mean", "aridity", 
                                                   "p_seasonality", "frac_snow", "hp_freq",
                                                   "hp_dur", "hp_time", "lp_freq", "lp_dur",
                                                   "lp_time"
                                                  ])

In [None]:
# Streamflow signatures
for gauge in tqdm.tqdm(timeseries_discharge.columns):
        
    # Correlation between runoff and precipitation
    hydrometeo_signatures_df.loc[gauge, "q_corr"] = timeseries_discharge.loc[:, gauge].corr(timeseries_precipitation.loc[:, gauge])
    
    # Runoff mean (mm/day)
    hydrometeo_signatures_df.loc[gauge, "q_mean"] = hydroanalysis.streamflow_signatures.calculate_q_mean(timeseries_discharge.loc[:, gauge].values, quality_discharge_precipitation.loc[:, gauge].values)
    
    # Runoff ratio (-)
    hydrometeo_signatures_df.loc[gauge, "q_runoff_ratio"] = hydroanalysis.streamflow_signatures.calculate_runoff_ratio(streamflow = timeseries_discharge.loc[:, gauge].values,
                                                                                              quality  = quality_discharge_precipitation.loc[:, gauge].values,
                                                                                            precipitation = timeseries_precipitation.loc[:, gauge].values)
    # Streamflow elasticity (-)
    elas_gauge = hydroanalysis.streamflow_signatures.calculate_stream_elas(streamflow = timeseries_discharge.loc[:, gauge].values,
                                                                quality  = quality_discharge_precipitation.loc[:, gauge].values,
                                                                precipitation = timeseries_precipitation.loc[:, gauge].values,
                                                                hydro_year  = hydro_year)
    try:
        hydrometeo_signatures_df.loc[gauge, ["q_elas_Sawicz", 
                                         "q_elas_Sankarasubramanian"]] = elas_gauge["Sawicz"],elas_gauge["Sankarasubramanian"]
    except: 
        hydrometeo_signatures_df.loc[gauge, ["q_elas_Sawicz", 
                                         "q_elas_Sankarasubramanian"]] = np.nan, np.nan
    
    # Slope (-)
    slope_gauge = hydroanalysis.streamflow_signatures.calculate_slope_fdc(streamflow = timeseries_discharge.loc[:, gauge].values,                                                                  
                                                                          quality  = quality_discharge_precipitation.loc[:, gauge].values)
    try:
        hydrometeo_signatures_df.loc[gauge, ["slope_sawicz", "slope_yadav",
                                         "slope_mcmillan", "slope_addor"]] = slope_gauge["Sawicz"],slope_gauge["Yadav"],slope_gauge["McMillan"],slope_gauge["Addor"]
    except: 
        hydrometeo_signatures_df.loc[gauge, ["slope_sawicz", "slope_yadav",
                                         "slope_mcmillan", "slope_addor"]] = np.nan, np.nan, np.nan, np.nan
    try: 
        # Baseflow index (-)
        hydrometeo_signatures_df.loc[gauge, "baseflow_index"] = hydroanalysis.streamflow_signatures.calculate_baseflow_index(streamflow = timeseries_discharge.loc[:, gauge].values, 
                                                                                             quality = quality_discharge_precipitation.loc[:, gauge].values)
    except:
        # Baseflow index (-)
        hydrometeo_signatures_df.loc[gauge, "baseflow_index"] = np.nan
        
        
    # Half-flow duration (days)
    hfd_gauge = hydroanalysis.streamflow_signatures.calculate_hfd_mean(streamflow = timeseries_discharge.loc[:, gauge].values,
                                                                quality  = quality_discharge_precipitation.loc[:, gauge].values,
                                                                hydro_year = hydro_year)
    try:
        hydrometeo_signatures_df.loc[gauge, ["hfd_mean", 
                          "hfd_std"]] = hfd_gauge["hfd_mean"],hfd_gauge["hfd_std"]
    except:
        hydrometeo_signatures_df.loc[gauge, ["hfd_mean", 
                          "hfd_std"]] = np.nan, np.nan
        
    # Q5 (mm/day)
    hydrometeo_signatures_df.loc[gauge, "q_5"] = hydroanalysis.streamflow_signatures.calculate_q_5(streamflow = timeseries_discharge.loc[:, gauge].values, 
                                                                          quality = quality_discharge_precipitation.loc[:, gauge].values)
    # Q95 (mm/day)
    hydrometeo_signatures_df.loc[gauge, "q_95"] = hydroanalysis.streamflow_signatures.calculate_q_95(streamflow = timeseries_discharge.loc[:, gauge].values, 
                                                                          quality = quality_discharge_precipitation.loc[:, gauge].values)
    
    # High-flow frequency (days/year) and mean duration (days)
    hq_gauge = hydroanalysis.streamflow_signatures.calculate_high_q_freq_dur(streamflow = timeseries_discharge.loc[:, gauge].values,
                                                                              quality  = quality_discharge_precipitation.loc[:, gauge].values)
    
    try: 
        hydrometeo_signatures_df.loc[gauge, ["hq_freq", 
                                         "hq_dur"]] = hq_gauge["hq_freq"],hq_gauge["hq_dur"]
    except:
        hydrometeo_signatures_df.loc[gauge, ["hq_freq", 
                                         "hq_dur"]] = np.nan, np.nan

    # Low-flow frequency (days/year) and mean duration (days)
    lq_gauge = hydroanalysis.streamflow_signatures.calculate_low_q_freq_dur(streamflow = timeseries_discharge.loc[:, gauge].values,
                                                                              quality  = quality_discharge_precipitation.loc[:, gauge].values)
    
    try:
        hydrometeo_signatures_df.loc[gauge, ["lq_freq", 
                                         "lq_dur"]] = lq_gauge["lq_freq"],lq_gauge["lq_dur"]
    except:
        hydrometeo_signatures_df.loc[gauge, ["lq_freq", 
                                         "lq_dur"]] = np.nan, np.nan
    
    # Zero-flow frequency (-)
    hydrometeo_signatures_df.loc[gauge, "zero_q_freq"] = hydroanalysis.streamflow_signatures.calculate_zero_q_freq(streamflow = timeseries_discharge.loc[:, gauge].values, 
                                                                          quality = quality_discharge_precipitation.loc[:, gauge].values)

hydrometeo_signatures_df = hydrometeo_signatures_df.apply(pd.to_numeric, errors='coerce')

In [None]:
# Meteorological signatures

for gauge in tqdm.tqdm(timeseries_discharge.columns):
        
    # P mean (mm/day)
    hydrometeo_signatures_df.loc[gauge, "p_mean"] = hydroanalysis.meteo_indexes.calculate_p_mean(precipitation = timeseries_precipitation.loc[:, gauge].values,
                                                                        quality  = quality_pet_precipitation_temperature.loc[:, gauge].values)
    
    # PET mean (mm/day)
    hydrometeo_signatures_df.loc[gauge, "pet_mean"] = hydroanalysis.meteo_indexes.calculate_pet_mean(pet = timeseries_pet.loc[:, gauge].values,
                                                                          quality  = quality_pet_precipitation_temperature.loc[:, gauge].values)
    
    # Aridity index (-)
    hydrometeo_signatures_df.loc[gauge, "aridity"] = hydroanalysis.meteo_indexes.calculate_aridity(precipitation = timeseries_precipitation.loc[:, gauge].values,
                                                                pet = timeseries_pet.loc[:, gauge].values,
                                                                quality  = quality_pet_precipitation_temperature.loc[:, gauge].values)
    # Precipitation seasonality (-)
    hydrometeo_signatures_df.loc[gauge, "p_seasonality"] = hydroanalysis.meteo_indexes.calculate_p_seasonality(precipitation = timeseries_precipitation.loc[:, gauge].values,
                                                                                quality  = quality_pet_precipitation_temperature.loc[:, gauge].values,
                                                                                date =timeseries_precipitation.loc[:, gauge].index,
                                                                                temperature = timeseries_temperature.loc[:, gauge].values)
    # Fraction of snow (-)
    hydrometeo_signatures_df.loc[gauge, "frac_snow"] = hydroanalysis.meteo_indexes.calculate_frac_snow(precipitation = timeseries_precipitation.loc[:, gauge].values,
                                                                              temperature = timeseries_temperature.loc[:, gauge].values,
                                                                              quality  = quality_pet_precipitation_temperature.loc[:, gauge].values,
                                                                              threshold=0.0)
    
    # High-precipitation frequency time
    high_prec_freq_time_gauge = hydroanalysis.meteo_indexes.calculate_high_prec_freq_time(precipitation = timeseries_precipitation.loc[:, gauge].values,
                                                                              quality  = quality_pet_precipitation_temperature.loc[:, gauge].values,
                                                                              date = timeseries_temperature.loc[:, gauge].index)
    try:
        hydrometeo_signatures_df.loc[gauge, ["hp_freq", 
                          "hp_dur", "hp_time"]] = high_prec_freq_time_gauge["hp_freq"],high_prec_freq_time_gauge["hp_dur"], high_prec_freq_time_gauge["hp_time"]
    except:
        hydrometeo_signatures_df.loc[gauge, ["hp_freq", 
                          "hp_dur", "hp_time"]] = np.nan, np.nan, np.nan
    
    # Low-precipitation frequency time
    low_prec_freq_time_gauge = hydroanalysis.meteo_indexes.calculate_low_prec_freq_time(precipitation = timeseries_precipitation.loc[:, gauge].values,
                                                                              quality  = quality_pet_precipitation_temperature.loc[:, gauge].values,
                                                                              date = timeseries_temperature.loc[:, gauge].index)
    try:
        hydrometeo_signatures_df.loc[gauge, ["lp_freq", 
                          "lp_dur", "lp_time"]] = low_prec_freq_time_gauge["lp_freq"],low_prec_freq_time_gauge["lp_dur"], low_prec_freq_time_gauge["lp_time"]
    except:
        hydrometeo_signatures_df.loc[gauge, ["lp_freq", 
                          "lp_dur", "lp_time"]] = np.nan, np.nan, np.nan
      

## Number of measurements used:

In [None]:
# Number of measurements:
hydrometeo_signatures_df[["num_days", "num_months", "num_months_complete", "num_years_hydro", "num_years_complete"]] = count_num_measurements(timeseries = timeseries_discharge)
hydrometeo_signatures_df.drop(["num_days", "num_months", "num_months_complete", "num_years_complete"], axis = 1, inplace = True)

hydrometeo_signatures_df["start_date_hydro"] = find_first_non_nan_dates(timeseries_discharge)
hydrometeo_signatures_df["end_date_hydro"] = find_last_non_nan_dates(timeseries_discharge)

In [None]:
 # Number of measurements:
hydrometeo_signatures_df[["num_days", "num_months", "num_months_complete", "num_years_climatic", "num_years_complete"]] = count_num_measurements(timeseries = timeseries_pet)
hydrometeo_signatures_df.drop(["num_days", "num_months", "num_months_complete", "num_years_complete"], axis = 1, inplace = True)
hydrometeo_signatures_df["start_date_climatic"] = find_first_non_nan_dates(timeseries_pet)
hydrometeo_signatures_df["end_date_climatic"] = find_last_non_nan_dates(timeseries_pet)

hydrometeo_signatures_df

In [None]:
# Here we organize the data with all the catchments (not only the filtered)
signatures_df = pd.DataFrame(columns = hydrometeo_signatures_df.columns, index = network_estreams.index)
signatures_df.loc[hydrometeo_signatures_df.index, :] =  hydrometeo_signatures_df
signatures_df

In [None]:
# Here we filter only the fields used at Addor et al. (2017)
signatures_df = signatures_df[["q_mean", "q_runoff_ratio", "q_elas_Sankarasubramanian", "slope_sawicz",
                               "baseflow_index", "hfd_mean", "hfd_std", "q_5", "q_95", "hq_freq", "hq_dur", "lq_freq", 
                               "lq_dur", "zero_q_freq", "p_mean", "pet_mean", "aridity", 
                               "p_seasonality", "frac_snow", "hp_freq",
                               "hp_dur", "hp_time", "lp_freq", "lp_dur", "lp_time",
                               "num_years_hydro", "start_date_hydro", "end_date_hydro",
                               "num_years_climatic", "start_date_climatic", "end_date_climatic"  
                                                  ]]
signatures_df

In [None]:
signatures_df.iloc[:, 0:-10] = signatures_df.iloc[:, 0:-10].astype(float).round(3)
signatures_df.iloc[:, -9:-7] = signatures_df.iloc[:, -9:-7].astype(float).round(3)
signatures_df

# Data export

In [None]:
# Export the final dataset:
signatures_df.to_csv(PATH_OUTPUT+"estreams_hydrometeo_signatures.csv")

# End