# CH-IRP Streamflow isotopes dataset extraction

Author: Thiago Nascimento (thiago.nascimento@eawag.ch)

This notebook is used to retrieve and concatenate the stream water isotopes dataset obtained from CH-IRP.

The output is one file per catchemnt (similar to the CAMELS-CH), with 3 columns:

* date
* delta_2h
* delta_18o

## Requirements
**Python:**

* Python>=3.6
* Jupyter
* geopandas=0.10.2
* numpy
* os
* pandas=2.1.3
* tqdm

Check the Github repository for an environment.yml (for conda environments) or requirements.txt (pip) file.

**Files:**

* {basin_id}.isoStrm


**Directory:**

* Clone the GitHub directory locally
* Place any third-data variables in their respective directory.
* ONLY update the "PATH" variable in the section "Configurations", with their relative path to the EStreams directory. 


## References
Staudinger, M., Seeger, S., Herbstritt, B., Stoelzle, M., Seibert, J., Stahl, K., and Weiler, M.: The CH-IRP data set: a decade of fortnightly data on δ2H and δ18O in streamflow and precipitation in Switzerland, Earth Syst. Sci. Data, 12, 3057–3066, https://doi.org/10.5194/essd-12-3057-2020, 2020.
## Observations
* None

# Import modules

In [None]:
import pandas as pd
import tqdm as tqdm
import os
import glob
import warnings
from pathlib import Path

# Configurations

In [2]:
# Only editable variables:
# Relative path to your local directory
PATH = ".."

# Suppress all warnings
warnings.filterwarnings("ignore")

# Path to where the data are stored
path_isot = Path(r"C:\Users\nascimth\Documents\data\CAMELS_CH_Chem\data\CH_IRP\isotopes_streamflow\isotopes_streamflow\\")


* #### The users should NOT change anything in the code below here. 

In [3]:
# Non-editable variables:
PATH_OUTPUT = r"results\Dataset\isotopes\streamwater\ch_irp\\"

# Set the directory:
os.chdir(PATH)

# Import data

- Network

In [4]:
# Network CAMELS_CH_Chem
network_camels_ch_chem = pd.read_csv(r"results\Dataset\gauges_metadata\camels_ch_chem_gauges_metadata.csv")
network_camels_ch_chem.set_index("gauge_id", inplace=True)
network_camels_ch_chem

Unnamed: 0_level_0,sensor_id,nawaf_id,nawat_id,isot_id,gauge_name,water_body_name,gauge_easting,gauge_northing,gauge_lon,gauge_lat,...,gauge_northing_nawaf,area_nawaf,foen_nawaf_dist,gauge_name_nawat,gauge_easting_nawat,gauge_northing_nawat,area_nawat,foen_nawat_dist,q_nawat_corrector,remarks
gauge_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2009,2009.0,1837.0,1837.0,NIO04,Porte du Scex,Rhône,557660,133280,6.89,46.35,...,133280.0,5239.4,0.0,Porte du Scex,557660.0,133280.0,5239.402096,0.0,1.000000,
2011,2011.0,,4070.0,,Sion,Rhône,593770,118630,7.36,46.22,...,,,,Sion,593277.0,118449.0,3372.417040,0.0,1.000005,
2016,2016.0,1833.0,1833.0,NIO02,Brugg,Aare,657000,259360,8.19,47.48,...,259360.0,11681.3,0.0,Brugg,657000.0,259360.0,11681.282882,0.0,0.999999,
2018,2018.0,1835.0,1339.0,,Mellingen,Reuss,662830,252580,8.27,47.42,...,252580.0,3385.8,0.0,Gebenstorf,659450.0,258850.0,3420.503458,10.0,1.010250,
2019,2019.0,,1852.0,NIO01,Brienzwiler,Aare,649930,177380,8.09,46.75,...,,,,Brienzerseeeinlauf,646692.0,177000.0,555.808970,3.3,1.001097,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2617,2617.0,,,,Müstair,Rom,830800,168700,10.45,46.63,...,,,,,,,,,,
2623,2623.0,,,,Oberwald,Rhone,669900,154075,8.35,46.53,...,,,,,,,,,,
2634,2634.0,6169.0,1181.0,,Emmen,Kleine Emme,663700,213630,8.28,47.07,...,213630.0,478.3,0.0,Emmen-Littau,663917.0,213356.0,478.277165,0.6,1.000188,station was moved from Littau to Emmen in 2013...
2635,2635.0,,,,"Einsiedeln, Gross",Grossbach,700710,218125,8.77,47.11,...,,,,,,,,,,Station moved in 2012?


In [6]:
def safe_readlines(path):
    """Try UTF-8, then Latin-1 if decoding fails"""
    try:
        with open(path, 'r', encoding='utf-8') as f:
            return f.readlines()
    except UnicodeDecodeError:
        with open(path, 'r', encoding='latin1') as f:
            return f.readlines()

for file in tqdm.tqdm(path_isot.glob("*.isoStrm")):
    gauge_id = file.stem

    try:
        lines = safe_readlines(file)
    except Exception as e:
        print(f"Couldn't read {file.name}: {e}")
        continue

    # Look for the line with dashes (---) and take the next line as the header
    data_start_idx = next(
        (i + 1 for i, line in enumerate(lines) if line.strip().startswith("---")),
        None
    )

    if data_start_idx is None or data_start_idx >= len(lines):
        print(f"No data header found in {file.name}. Skipping.")
        continue

    try:
        df = pd.read_csv(file, delim_whitespace=True, skiprows=data_start_idx, encoding='utf-8')
    except UnicodeDecodeError:
        try:
            df = pd.read_csv(file, delim_whitespace=True, skiprows=data_start_idx, encoding='latin1')
        except Exception as e:
            print(f"Error reading {file.name} with Latin-1: {e}")
            continue

    df = df.loc[:, ~df.columns.duplicated()]  # Optional: drop duplicate headers if any
    df.columns = ["date", "delta_18o", "delta_2h"]
    df = df[["date", "delta_2h", "delta_18o"]]
    df.set_index("date", inplace=True)

    df.to_csv(PATH_OUTPUT + "/camels_ch_chem_chirp_"+str(gauge_id)+".csv", encoding='latin')

14it [00:00, 123.63it/s]


Observations
- We have 14 stations in total
- We deleted (manually) 2319, 2409, 2491 from now, so only 11

# End