# New codes for the basins


Author: Thiago Nascimento (thiago.nascimento@eawag.ch)

This notebook is part of the EStreams publication and was used to rename the basin_id of the catchemnts within EStreams. 

* This enables the easy rename of the entire database encompassing attributes and records. 
* It is done just to avoid the need of process all the aggregation of the catchment attributes again. 
* In case we do the procedure from the begining, this script is not needed to be run. 

Note that this code enables not only the replicability of the current database but also the extrapolation to new catchment areas. 
* Additionally, the user should download and insert the original raw-data in the folder of the same name prior to run this code. 
* The original third-party data used were not made avaialble in this repository due to redistribution and storage-space reasons.  

## Requirements
**Python:**

* Python>=3.6
* Jupyter
* Geopandas=0.10.2
* Pandas
* tqdm

Check the Github repository for an environment.yml (for conda environments) or requirements.txt (pip) file.

**Files:**

* estreams_boundaries.shp
* estreams_network.xlsx
* estreams_timeseries.csv
* results/estreams_gauging_stations_old_new_codes.csv
* attributes

**Directory:**

* Clone the GitHub directory locally
* ONLY update the "PATH" variable in the section "Configurations", with their relative path to the EStreams directory. 

## References


## License


# Import modules

In [3]:
import geopandas as gpd
import seaborn as sns
import matplotlib.pyplot as plt
import os
import glob
import pandas as pd
import numpy as np
from shapely.geometry import Point, Polygon
import matplotlib as mpl
import glob
import tqdm as tqdm
import warnings

# Ignore all warnings
warnings.filterwarnings("ignore")

# Configurations

In [4]:
# Only editable variable:
# Relative path to your local directory
PATH = ".."

* #### The users should NOT change anything in the code below here. 

In [None]:
# Non-editable variables:
PATH_OUTPUT = "results/"
os.chdir(PATH)

#  Import data
## Network (old and new) codes
* Here we have a list with the old and new codes of the stations

In [5]:
network_EU = pd.pd.read_csv("results/estreams_gauging_stations_old_new_codes.csv", encoding='utf-8')
network_EU

Unnamed: 0,old_code,new_code
0,AT00001,AT000001
1,AT00002,AT000002
2,AT00003,AT000003
3,AT00004,AT000004
4,AT00005,AT000005
...,...,...
15042,UAGR017,UAGR0017
15043,UAGR018,UAGR0018
15044,UAGR019,UAGR0019
15045,UAGR020,UAGR0020


In [13]:
network_attributes = network_EU.copy()
network_attributes.set_index("old_code", inplace = True)

In [15]:
# Mapping:
new_code_mapping = network_attributes['new_code'].to_dict()

# Processing
## Meteorology

In [9]:
# First we proceed to create the dictionary with old and new names:
network_old_new_csv = pd.DataFrame()
network_old_new_csv['old_name'] = "estreams_meteorology_" + network_EU.old_code.astype(str) + ".csv"
network_old_new_csv['new_name'] = "estreams_meteorology_" + network_EU.new_code.astype(str) + ".csv"
network_old_new_csv

Unnamed: 0,old_name,new_name
0,estreams_meteorology_AT00001.csv,estreams_meteorology_AT000001.csv
1,estreams_meteorology_AT00002.csv,estreams_meteorology_AT000002.csv
2,estreams_meteorology_AT00003.csv,estreams_meteorology_AT000003.csv
3,estreams_meteorology_AT00004.csv,estreams_meteorology_AT000004.csv
4,estreams_meteorology_AT00005.csv,estreams_meteorology_AT000005.csv
...,...,...
15042,estreams_meteorology_UAGR017.csv,estreams_meteorology_UAGR0017.csv
15043,estreams_meteorology_UAGR018.csv,estreams_meteorology_UAGR0018.csv
15044,estreams_meteorology_UAGR019.csv,estreams_meteorology_UAGR0019.csv
15045,estreams_meteorology_UAGR020.csv,estreams_meteorology_UAGR0020.csv


In [10]:
name_mapping_df = network_old_new_csv[["old_name", "new_name"]]
name_mapping_df.set_index("old_name", inplace = True)
name_mapping_df

Unnamed: 0_level_0,new_name
old_name,Unnamed: 1_level_1
estreams_meteorology_AT00001.csv,estreams_meteorology_AT000001.csv
estreams_meteorology_AT00002.csv,estreams_meteorology_AT000002.csv
estreams_meteorology_AT00003.csv,estreams_meteorology_AT000003.csv
estreams_meteorology_AT00004.csv,estreams_meteorology_AT000004.csv
estreams_meteorology_AT00005.csv,estreams_meteorology_AT000005.csv
...,...
estreams_meteorology_UAGR017.csv,estreams_meteorology_UAGR0017.csv
estreams_meteorology_UAGR018.csv,estreams_meteorology_UAGR0018.csv
estreams_meteorology_UAGR019.csv,estreams_meteorology_UAGR0019.csv
estreams_meteorology_UAGR020.csv,estreams_meteorology_UAGR0020.csv


In [11]:
# Assuming df is your DataFrame with 'old_name' as the index and 'new_name' as another column
name_mapping = name_mapping_df['new_name'].to_dict()
name_mapping

{'estreams_meteorology_AT00001.csv': 'estreams_meteorology_AT000001.csv',
 'estreams_meteorology_AT00002.csv': 'estreams_meteorology_AT000002.csv',
 'estreams_meteorology_AT00003.csv': 'estreams_meteorology_AT000003.csv',
 'estreams_meteorology_AT00004.csv': 'estreams_meteorology_AT000004.csv',
 'estreams_meteorology_AT00005.csv': 'estreams_meteorology_AT000005.csv',
 'estreams_meteorology_AT00006.csv': 'estreams_meteorology_AT000006.csv',
 'estreams_meteorology_AT00007.csv': 'estreams_meteorology_AT000007.csv',
 'estreams_meteorology_AT00008.csv': 'estreams_meteorology_AT000008.csv',
 'estreams_meteorology_AT00009.csv': 'estreams_meteorology_AT000009.csv',
 'estreams_meteorology_AT00010.csv': 'estreams_meteorology_AT000010.csv',
 'estreams_meteorology_AT00011.csv': 'estreams_meteorology_AT000011.csv',
 'estreams_meteorology_AT00012.csv': 'estreams_meteorology_AT000012.csv',
 'estreams_meteorology_AT00013.csv': 'estreams_meteorology_AT000013.csv',
 'estreams_meteorology_AT00014.csv': '

In [None]:
# Folder containing CSV files
folder_path = '/Users/thiagomedeirosdonascimento/Library/CloudStorage/OneDrive-Personal/PhD/Eawag/Papers/Paper1_Database/Database/EStreams/meteorology'

# List all files in the folder
files = os.listdir(folder_path)

# Iterate through each file
for file in tqdm.tqdm(files):
    # Check if the file is a CSV file and its name is in the mapping
    if file.endswith('.csv') and file in name_mapping:
        # Get the new name from the mapping
        new_name = name_mapping[file]
        # Rename the file
        os.rename(os.path.join(folder_path, file), os.path.join(folder_path, new_name))
        #print(f"Renamed {file} to {new_name}")


## Static attributes

In [6]:
# First we check the location of the precipitation stations:
# Check the files in the subdirectory:
filenames = glob.glob("/Users/thiagomedeirosdonascimento/Library/CloudStorage/OneDrive-Personal/PhD/Eawag/Papers/Paper1_Database/GitHub/EStreams/results/staticattributes/*.csv")
print("Number of files:", len(filenames))

Number of files: 9


In [23]:
for filename in tqdm.tqdm(filenames):
    attribute_csv = pd.read_csv(filename)
    attribute_csv.set_index("basin_id", inplace = True)

    attribute_csv["new_code"] = network_attributes.new_code
    attribute_csv.set_index("new_code", inplace = True)
    attribute_csv.index.name = "basin_id"
    
    # Save the new data:
    attribute_csv.to_csv(filename)

100%|█████████████████████████████████████████████| 9/9 [00:02<00:00,  3.68it/s]


## Temporal attributes

### Irrigation

In [36]:
path_irrigation = "/Users/thiagomedeirosdonascimento/Library/CloudStorage/OneDrive-Personal/PhD/Eawag/Papers/Paper1_Database/GitHub/EStreams/results/timeseries/irrigation/estreams_irrigation_yearly.csv"
irrigation_csv = pd.read_csv(path_irrigation,
                            index_col = 0)
irrigation_csv

Unnamed: 0_level_0,AT00001,AT00002,AT00003,AT00004,AT00005,AT00006,AT00007,AT00008,AT00009,AT00010,...,UAGR012,UAGR013,UAGR014,UAGR015,UAGR016,UAGR017,UAGR018,UAGR019,UAGR020,UAGR021
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1900,27.2727,0.0,0.0,0,0.0,0,0.0,0,0.0,0,...,41.888594,17.406213,56.662451,0.0,0.31488,0,0,0,0,0.0
1910,21.910601,0.0,0.0,0,0.0,0,0.0,0,0.0,0,...,41.888594,17.406213,56.662451,0.0,0.31488,0,0,0,0,0.0
1920,17.0792,0.0,0.0,0,0.0,0,0.0,0,0.0,0,...,43.905283,9.8091,66.362041,0.0,0.653846,0,0,0,0,0.0
1930,14.919301,0.0,0.0,0,0.0,0,0.0,0,0.0,0,...,97.974678,34.590928,148.478916,0.0,2.22018,0,0,0,0,0.0
1940,12.7115,0.0,0.0,0,0.0,0,0.0,0,0.0,0,...,149.956963,58.759922,228.39207,0.0,3.6856,0,0,0,0,0.0
1950,14.097,0.0,0.0,0,0.0,0,0.0,0,0.0,0,...,512.126719,212.681055,771.514844,0.0,14.4605,0,0,0,0,0.42575
1960,14.7215,0.0,0.0,0,0.0,0,0.0,0,0.0,0,...,928.150234,371.289531,1359.939688,0.0,25.35926,0,0,0,0,2.2677
1970,14.6325,0.0,0.0,0,0.0,0,0.0,0,0.0,0,...,2427.397656,1035.890313,3303.243438,0.0,55.750615,0,0,0,0,7.0462
1980,20.785708,0.0,0.0,0,0.0,0,0.0,0,0.0,0,...,5397.56625,2274.955313,7051.614375,3.59503,123.78043,0,0,0,0,14.342
1985,22.413608,0.0,0.0,0,0.0,0,0.0,0,0.0,0,...,6767.456875,2866.501875,8733.7925,19.46499,156.136289,0,0,0,0,17.084


In [37]:
irrigation_csv.rename(columns=new_code_mapping, inplace=True)
irrigation_csv

Unnamed: 0_level_0,AT000001,AT000002,AT000003,AT000004,AT000005,AT000006,AT000007,AT000008,AT000009,AT000010,...,UAGR0012,UAGR0013,UAGR0014,UAGR0015,UAGR0016,UAGR0017,UAGR0018,UAGR0019,UAGR0020,UAGR0021
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1900,27.2727,0.0,0.0,0,0.0,0,0.0,0,0.0,0,...,41.888594,17.406213,56.662451,0.0,0.31488,0,0,0,0,0.0
1910,21.910601,0.0,0.0,0,0.0,0,0.0,0,0.0,0,...,41.888594,17.406213,56.662451,0.0,0.31488,0,0,0,0,0.0
1920,17.0792,0.0,0.0,0,0.0,0,0.0,0,0.0,0,...,43.905283,9.8091,66.362041,0.0,0.653846,0,0,0,0,0.0
1930,14.919301,0.0,0.0,0,0.0,0,0.0,0,0.0,0,...,97.974678,34.590928,148.478916,0.0,2.22018,0,0,0,0,0.0
1940,12.7115,0.0,0.0,0,0.0,0,0.0,0,0.0,0,...,149.956963,58.759922,228.39207,0.0,3.6856,0,0,0,0,0.0
1950,14.097,0.0,0.0,0,0.0,0,0.0,0,0.0,0,...,512.126719,212.681055,771.514844,0.0,14.4605,0,0,0,0,0.42575
1960,14.7215,0.0,0.0,0,0.0,0,0.0,0,0.0,0,...,928.150234,371.289531,1359.939688,0.0,25.35926,0,0,0,0,2.2677
1970,14.6325,0.0,0.0,0,0.0,0,0.0,0,0.0,0,...,2427.397656,1035.890313,3303.243438,0.0,55.750615,0,0,0,0,7.0462
1980,20.785708,0.0,0.0,0,0.0,0,0.0,0,0.0,0,...,5397.56625,2274.955313,7051.614375,3.59503,123.78043,0,0,0,0,14.342
1985,22.413608,0.0,0.0,0,0.0,0,0.0,0,0.0,0,...,6767.456875,2866.501875,8733.7925,19.46499,156.136289,0,0,0,0,17.084


In [38]:
# Check if there are any NaN values in any column
if irrigation_csv.columns.isna().any().any():
    print("There are NaN values in the DataFrame.")

In [39]:
# Save the new data:
irrigation_csv.to_csv(path_irrigation)

### Snow cover

In [40]:
# First we check the location of the precipitation stations:
# Check the files in the subdirectory:
filenames = glob.glob("/Users/thiagomedeirosdonascimento/Library/CloudStorage/OneDrive-Personal/PhD/Eawag/Papers/Paper1_Database/GitHub/EStreams/results/timeseries/snowcover/*.csv")
print("Number of files:", len(filenames))

Number of files: 2


In [42]:
for filename in tqdm.tqdm(filenames):
    attribute_csv = pd.read_csv(filename, index_col = 0)
    attribute_csv.index.name = "dates"
    attribute_csv.rename(columns=new_code_mapping, inplace=True)

    # Save the new data:
    attribute_csv.to_csv(filename)

100%|█████████████████████████████████████████████| 2/2 [00:03<00:00,  1.53s/it]


### Vegetation indices

In [6]:
# First we check the location of the precipitation stations:
# Check the files in the subdirectory:
filenames = glob.glob("/Users/thiagomedeirosdonascimento/Library/CloudStorage/OneDrive-Personal/PhD/Eawag/Papers/Paper1_Database/GitHub/EStreams/results/timeseries/vegetationindices/*.csv")
print("Number of files:", len(filenames))

Number of files: 4


In [16]:
for filename in tqdm.tqdm(filenames):
    attribute_csv = pd.read_csv(filename, index_col = 0)
    attribute_csv.index.name = ""
    attribute_csv.rename(columns=new_code_mapping, inplace=True)

    # Save the new data:
    attribute_csv.to_csv(filename)

100%|█████████████████████████████████████████████| 4/4 [00:05<00:00,  1.29s/it]


### Meteorology

In [17]:
# First we check the location of the precipitation stations:
# Check the files in the subdirectory:
filenames = glob.glob("/Users/thiagomedeirosdonascimento/Library/CloudStorage/OneDrive-Personal/PhD/Eawag/Papers/Paper1_Database/GitHub/EStreams/results/timeseries/meteorology/*.csv")
print("Number of files:", len(filenames))

Number of files: 3


In [18]:
for filename in tqdm.tqdm(filenames):
    attribute_csv = pd.read_csv(filename, index_col = 0)
    attribute_csv.index.name = ""
    attribute_csv.rename(columns=new_code_mapping, inplace=True)

    # Save the new data:
    attribute_csv.to_csv(filename)

100%|████████████████████████████████████████████| 3/3 [09:32<00:00, 190.86s/it]


### Streamflow indices
* Monthly

In [19]:
# First we check the location of the precipitation stations:
# Check the files in the subdirectory:
filenames = glob.glob("/Users/thiagomedeirosdonascimento/Library/CloudStorage/OneDrive-Personal/PhD/Eawag/Papers/Paper1_Database/GitHub/EStreams/results/timeseries/streamflowindices/monthly/*.csv")
print("Number of files:", len(filenames))

Number of files: 8


In [20]:
for filename in tqdm.tqdm(filenames):
    attribute_csv = pd.read_csv(filename, index_col = 0)
    attribute_csv.index.name = ""
    attribute_csv.rename(columns=new_code_mapping, inplace=True)

    # Save the new data:
    attribute_csv.to_csv(filename)

100%|█████████████████████████████████████████████| 8/8 [01:17<00:00,  9.68s/it]


* Weekly

In [21]:
# First we check the location of the precipitation stations:
# Check the files in the subdirectory:
filenames = glob.glob("/Users/thiagomedeirosdonascimento/Library/CloudStorage/OneDrive-Personal/PhD/Eawag/Papers/Paper1_Database/GitHub/EStreams/results/timeseries/streamflowindices/weekly/*.csv")
print("Number of files:", len(filenames))

Number of files: 6


In [22]:
for filename in tqdm.tqdm(filenames):
    attribute_csv = pd.read_csv(filename, index_col = 0)
    attribute_csv.index.name = ""
    attribute_csv.rename(columns=new_code_mapping, inplace=True)

    # Save the new data:
    attribute_csv.to_csv(filename)

100%|█████████████████████████████████████████████| 6/6 [04:08<00:00, 41.45s/it]


* Yearly

In [23]:
# First we check the location of the precipitation stations:
# Check the files in the subdirectory:
filenames = glob.glob("/Users/thiagomedeirosdonascimento/Library/CloudStorage/OneDrive-Personal/PhD/Eawag/Papers/Paper1_Database/GitHub/EStreams/results/timeseries/streamflowindices/yearly/*.csv")
print("Number of files:", len(filenames))

Number of files: 23


In [24]:
for filename in tqdm.tqdm(filenames):
    attribute_csv = pd.read_csv(filename, index_col = 0)
    attribute_csv.index.name = ""
    attribute_csv.rename(columns=new_code_mapping, inplace=True)

    # Save the new data:
    attribute_csv.to_csv(filename)

100%|███████████████████████████████████████████| 23/23 [00:19<00:00,  1.17it/s]
