# Streamflow Catalogue Demonstration: Records download


Author: Thiago Nascimento (thiago.nascimento@eawag.ch)

This notebook is part of the complementary material provided alongside the EStreams publication. It serves to offer additional guidance on how to utilize the catalogue and precisely download streamflow records directly from national providers. While there may not be a web scraping code for each provider, the ones currently provided can serve as a starting point for future users to adapt the code according to their requirements.

* Note that this code enables not only the replicability of the current database but also the extrapolation to new catchment areas. 
* Additionally, the user should download and insert the original raw-data in the folder of the same name prior to run this code. 
* The original third-party data used were not made available in this repository due to redistribution and storage-space reasons.  

## Requirements
**Python:**

* Python>=3.6
* Jupyter
* geopandas=0.10.2
* pandas
* numpy
* tqdm
* os
* osgeo

Check the Github repository for an environment.yml (for conda environments) or requirements.txt (pip) file.

**Files:**

* data/streamflow/estreams_gauging_stations.csv
* data/streamflow/estreams_streamflow_catalogue.csv

**Directory:**

* Clone the GitHub directory locally
* Place any third-data variables in their respective directory.
* ONLY update the "PATH" variable in the section "Configurations", with their relative path to the EStreams directory. 

## References
- Please check the Streamflow Catalogue for full information about how to reference the streamflow records. 

## Observations
- This notebook presents some codes for automatize the data retriavel for some countries; and also specifies when an official API is already provided, or when is possible to download all the records at once.

## Disclaimer
- All scripts presented in this notebook are intended for demonstrative and guidance purposes.
- The web scraping demonstrated herein is conducted on websites where its usage is not explicitly prohibited.
- We emphasize that we have no affiliation with the respective websites showcased for demonstration, and we disclaim any liability for any consequences arising from the misuse of the provided code.
- Users are advised to exercise caution and prudence when utilizing any web-scrapping code.
- It is incumbent upon users to adhere to the terms of service of websites from which they scrape data. 

# Import modules

In [1]:
import pandas as pd
import requests
import os
import tqdm as tqdm
from utils.FR import download_data_FR
from utils.IT import download_data_ITIS, download_metadata_ITIS
from utils.IE import download_data_IEEPA
from utils.HR import get_metadata_HR, download_data_HR

# Configurations

In [2]:
# Only editable variables:
# Relative path to your local directory
PATH = "../../.."

# Non-editable variables:
PATH_OUTPUT = "results/staticattributes/"

# Set the option to display the full content of each cell
pd.set_option('display.max_colwidth', None)

# Set the directory:
os.chdir(PATH)

# Import data
## Network information

In [3]:
network_estreams = pd.read_csv('data/streamflow/estreams_gauging_stations.csv', encoding='utf-8')
network_estreams.set_index("basin_id", inplace = True)
network_estreams.head()

Unnamed: 0_level_0,gauge_id,gauge_name,gauge_country,gauge_provider,river,lon_snap,lat_snap,lon,lat,elevation,...,num_continuous_days,num_days_gaps,num_days_reliable,num_days_noflag,num_days_suspect,gauge_flag,duplicated_suspect,watershed_group,gauges_upstream,nested_catchments
basin_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
AT000001,200014,Bangs,AT,AT_EHYD,Rhein,9.534835,47.273748,9.534835,47.273748,420,...,9497,0.0,0.0,9497.0,0.0,B,['CH000197'],1,16,"['AT000001', 'CH000010', 'CH000046', 'CH000048', 'CH000062', 'CH000105', 'CH000113', 'CH000125', 'CH000129', 'CH000135', 'CH000139', 'CH000173', 'CH000175', 'CH000192', 'CH000197']"
AT000002,200048,Schruns (Vonbunweg),AT,AT_EHYD,Litz,9.913677,47.080301,9.913677,47.080301,673,...,23103,0.0,0.0,23103.0,0.0,B,['CH000221'],1,1,['AT000002']
AT000003,231662,Loruens-Aeule,AT,AT_EHYD,Ill,9.847765,47.132821,9.847765,47.132821,579,...,13513,0.0,0.0,13513.0,0.0,B,['CH000215'],1,2,"['AT000002', 'AT000003', 'CH000221']"
AT000004,200592,Kloesterle (OEBB),AT,AT_EHYD,Alfenz,10.061843,47.128994,10.061843,47.128994,1014,...,8765,0.0,0.0,8765.0,0.0,B,['CH000227'],1,1,['AT000004']
AT000005,200097,Buers (Bruecke L82),AT,AT_EHYD,Alvier,9.802668,47.15077,9.802668,47.15077,564,...,10957,0.0,0.0,10957.0,0.0,B,['CH000214'],1,3,"['AT000005', 'CH000214']"


## Streamflow catalogue

In [4]:
catalogue_estreams = pd.read_csv('data/streamflow/estreams_streamflow_catalogue.csv', encoding='utf-8')
catalogue_estreams.set_index("provider_id", inplace = True)
catalogue_estreams.head(5)

Unnamed: 0_level_0,code_basins,provider_country,country_code,provider_name,license_redistribution,platform,num_stations,start_date,end_date,website,source_license,source_streamflow,source_gauges_infos,references,observations,download_method
provider_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
AT_EHYD,AT,AUSTRIA,AT,Hydrographische Archivdaten Österreichs (eHYD),-,Website,582,1950-12-31,2021-12-31,https://ehyd.gv.at/,https://ehyd.gv.at/,https://ehyd.gv.at/,https://zenodo.org/record/5153305#.ZDUeaOZBwuU,"BML. Federal Ministry of Agriculture, Forestry, Regions and Water Management: WebGIS-Applikation eHYD, Wien, Austria, https://ehyd.gv.at (last access: 05 May 2023).",,Downloadable all at once
BA_FHMZ,BA,BOSNIA AND HERZEGOVINA,BA,Federalni hidrometeorološki zavod (FHMZ),-,Website,91,1987-01-01,2019-12-31,https://www.fhmzbih.gov.ba/latinica/index.php,-,https://www.fhmzbih.gov.ba/latinica/HIDRO/godisnjaci.php,https://www.fhmzbih.gov.ba/latinica/HIDRO/godisnjaci.php,"FHMZBIH. Federalni hidrometeorološki zavod: Početna: idrologija: hidrološki godišnjaci, Bosnia. https://www.fhmzbih.gov.ba/latinica/HIDRO/godisnjaci.php (last access: 29 June 2023).",,Code provided by EStreams
BE_SPW,BEWA,BELGIUM,BE,Service public de Wallonie (SPW),No-redistribution,Website,164,1968-01-01,2023-10-16,https://hydrometrie.wallonie.be/home.html,https://hydrometrie.wallonie.be/mentions-legales.html,https://hydrometrie.wallonie.be/home/observations/debit.html?mode=announcement,https://hydrometrie.wallonie.be/home/observations/debit.html?mode=announcement,"SPW. Service public de Wallonie: L’hydrométrie en Wallonie: Observations: Debit, Belgium. https://hydrometrie.wallonie.be/home/observations/debit.html?mode=announcement (last access: 07 Dec 2023).",,Downloadable all at once
BE_WATERINFO,BEVL,BELGIUM,BE,Vlaanderen waterinfo,Reproduction allowed,Website,66,1968-12-31,2023-10-10,https://www.vlaanderen.be,https://www.waterinfo.be/default.aspx?path=NL/Algemene_Info/Disclaimer,https://www.waterinfo.be/kaartencatalogus?KL=en,https://www.waterinfo.be/kaartencatalogus?KL=en,"VW. Vlaanderen waterinfo, Belgium. https://www.waterinfo.be/kaartencatalogus?KL=en (last access: 07 Dec 2023).",,Downloadable individually
BG_GRDC,BGGR,BULGARIA,BG,Global Runoff Data Center (GRDC),No-redistribution,Website,8,1978-01-01,1999-12-31,https://www.bafg.de/GRDC/EN,https://www.bafg.de/GRDC/EN/01_GRDC/12_plcy/data_policy.html?nn=2862854,https://www.bafg.de/GRDC/EN,https://www.bafg.de/GRDC/EN,"GRDC. Global Runoff Data Center: River discharge data. Federal Institute of Hydrology, 56068 Koblenz, Germany. https://www.bafg.de/GRDC (last access: 16 Feb 2024).",,Downloadable all at once


# Overview of the catalogue

## Different download methods
There are five types of download available when EStreams was built:
1. Records downloadable all at once directly from providers (no API or code needed)


In [5]:
catalogue_estreams[catalogue_estreams.download_method == 'Downloadable all at once'][["provider_country", "num_stations"]]

Unnamed: 0_level_0,provider_country,num_stations
provider_id,Unnamed: 1_level_1,Unnamed: 2_level_1
AT_EHYD,AUSTRIA,582
BE_SPW,BELGIUM,164
BG_GRDC,BULGARIA,8
BY_GRDC,BELARUS,51
CH_CAMELS,SWITZERLAND,298
CY_GRDC,CYPRUS,14
CZ_GRDC,CZECHIA,29
DE_NW,GERMANY,270
DK_ODA,DENMARK,1000
EE_GRDC,ESTONIA,67


In [11]:
print("The total number of catchments is:", catalogue_estreams[catalogue_estreams.download_method == 'Downloadable all at once'][["num_stations"]].sum().values)

The total number of catchments is: [4456]


2. Code provided by EStreams

In [12]:
catalogue_estreams[catalogue_estreams.download_method == 'Code provided by EStreams'][["provider_country",  "num_stations"]]

Unnamed: 0_level_0,provider_country,num_stations
provider_id,Unnamed: 1_level_1,Unnamed: 2_level_1
BA_FHMZ,BOSNIA AND HERZEGOVINA,91
FR_EAUFRANCE,FRANCE,4968
HR_DHZ,CROATIA,317
IE_EPA,IRELAND,184
IT_ISPRA,ITALY,294
IT_TOS,ITALY,78
PL_IMGW,POLAND,1287


In [13]:
print("The total number of catchments is:", catalogue_estreams[catalogue_estreams.download_method == 'Code provided by EStreams'][["num_stations"]].sum().values)

The total number of catchments is: [7219]


3. Records need to be downloaded manually and individually per gauge

In [14]:
catalogue_estreams[catalogue_estreams.download_method == 'Downloadable individually'][["provider_country",  "num_stations"]]

Unnamed: 0_level_0,provider_country,num_stations
provider_id,Unnamed: 1_level_1,Unnamed: 2_level_1
BE_WATERINFO,BELGIUM,66
CZ_CHMU,CZECHIA,537
DE_BB,GERMANY,147
DE_BE,GERMANY,13
DE_BW,GERMANY,227
DE_BY,GERMANY,499
DE_HE,GERMANY,120
DE_NI,GERMANY,123
DE_SH,GERMANY,234
DE_SN,GERMANY,169


In [15]:
print("The total number of catchments is:", catalogue_estreams[catalogue_estreams.download_method == 'Downloadable individually'][["num_stations"]].sum().values)

The total number of catchments is: [4219]


4. Request form is needed

In [53]:
catalogue_estreams[catalogue_estreams.download_method == 'Download via request form'][["provider_country",  "num_stations"]]

Unnamed: 0_level_0,provider_country,num_stations
provider_id,Unnamed: 1_level_1,Unnamed: 2_level_1
DE_TH,GERMANY,56
DE_BU,GERMANY,16
DE_RP,GERMANY,94
HU_CONTACTFORM,HUNGARY,47
IT_ABR_CONTACTFORM,ITALY,8
IT_LIG_CONTACTFORM,ITALY,24
IT_LOM_CONTACTFORM,ITALY,26
IT_VEN,ITALY,35
LU_CONTACTFORM,LUXEMBURG,19


In [18]:
print("The total number of catchments is:", catalogue_estreams[catalogue_estreams.download_method == 'Download via request form'][["num_stations"]].sum().values)

The total number of catchments is: [325]


5. API provided by the official provider

In [17]:
catalogue_estreams[catalogue_estreams.download_method == 'API by the official provider'][["provider_country",  "num_stations"]]

Unnamed: 0_level_0,provider_country,num_stations
provider_id,Unnamed: 1_level_1,Unnamed: 2_level_1
GB_NRFA,GREAT BRITAIN,671
NI_NRFA,NORTHERN IRELAND,51
NO_NVE,NORWAY,189


In [19]:
print("The total number of catchments is:", catalogue_estreams[catalogue_estreams.download_method == 'API by the official provider'][["num_stations"]].sum().values)

The total number of catchments is: [911]


## Overview per country
- Here we present a demonstration for each specific country

## Austria
- For AT, users can access their respective websiste available from the EStreams catalogue, and they may find the option to download all the streamflow records at once. 

In [5]:
print(catalogue_estreams[catalogue_estreams.country_code =="AT"].source_streamflow)

provider_id
AT_EHYD    https://ehyd.gv.at/
Name: source_streamflow, dtype: object


## Bosnia and H.
- For BA, users can access its respective website available from the EStreams catalogue, and download all the streamflow yearbooks in PDF.
- In the complementary script so called "estreams_records_organization" users are able to follow the step-by-step of data conversion and organization.

In [6]:
print(catalogue_estreams[catalogue_estreams.country_code =="BO"].source_streamflow)

provider_id
BA_FHMZ    https://www.fhmzbih.gov.ba/latinica/HIDRO/godisnjaci.php
Name: source_streamflow, dtype: object


## Belgium
- For BE, users have two data providers. The data can be downloaded directly from the official websites.
- For BE_SPW, users can download the records in groups of up to 10 gauges. 
- For BE_WATERINFO users should download data from each station manually and individually.

In [7]:
print(catalogue_estreams[catalogue_estreams.country_code =="BE"].source_streamflow)

provider_id
BE_SPW          https://hydrometrie.wallonie.be/home/observations/debit.html?mode=announcement
BE_WATERINFO                                   https://www.waterinfo.be/kaartencatalogus?KL=en
Name: source_streamflow, dtype: object


## Switzerland
- For CH, users have the option to download the data from CAMELS-CH, which provide all records in a concise and already organized way. 

In [8]:
print(catalogue_estreams[catalogue_estreams.country_code =="CH"].source_streamflow)

provider_id
CH_CAMELS    https://zenodo.org/doi/10.5281/zenodo.7784632
Name: source_streamflow, dtype: object


## Czechia
- For CZ, users can access and download the data (individually and manually) directly from its website and from GRDC.

In [9]:
print(catalogue_estreams[catalogue_estreams.country_code =="CZ"].source_streamflow)

provider_id
CZ_CHMU    https://isvs.chmi.cz/ords/f?p=11002:HOME:5026647009329:::::
CZ_GRDC                                    https://www.bafg.de/GRDC/EN
Name: source_streamflow, dtype: object


## Germany
- For DE, users have 13 data providers. can access and download the data (individually and manually) directly from its website.
- For 10 of the providers (DE_BB, DE_BE, DE_BW, DE_BY, DE_HE, DE_NI, DE_NW, DE_SH, DE_SN and DE_ST), users can download the data directly from their respective website. 
- For 3 of the providers (DE_TH, DE_BU and DE_RP), users shoudl request the data via a contact form. 

In [10]:
print(catalogue_estreams[catalogue_estreams.country_code =="DE"].source_streamflow)

provider_id
DE_BB                                                                                                                                                                                                                                                                                                                              https://apw.brandenburg.de/?th=owm_gkp/
DE_BE                                                                                                                                                                                                                                                                                                                             https://wasserportal.berlin.de/start.php
DE_BW                                                                                                                                      https://udo.lubw.baden-wuerttemberg.de/public/processingChain?repositoryItemGlobalId=hydrologische_landespegel&conditionVal

- For DE_NW, users may find the full description of how to download the records of all stations at once:

In [61]:
print(catalogue_estreams.loc["DE_NW"].observations, print(catalogue_estreams.loc["DE_NW"].source_streamflow))

https://www.elwasweb.nrw.de/elwas-web/data-objekt;jsessionid=DADDD7196B89E206917D18793294E375;jsessionid=F76CC7CC8ECFBA5F518ECD241AF0BA78?art=Pegel
To download all stations at once: 1. Check the panel “Oberflachengwasser”, and choose “Pegel” under “Menge”; 2. Then make sure your variable (“Hauptwerte”) is set to average discharge “MQ –Mittlerer Abfluss”. The you can just click “Suchen” search, and all stations will be visible; 3. Then tick the box at the top to select all stations; 4. Now you can download an excel with the metadata of all stations by choosing “Excel Export”. To download the timeseries click “Export Abflussdaten”. None


## Denmark
- For DK, users can download the records and gauges metadata easily online. 
- Users have the option to download at once the complete records for all gauges together. 

In [5]:
print(catalogue_estreams[catalogue_estreams.country_code =="DK"].source_streamflow)

provider_id
DK_ODA    https://odaforalle.au.dk/login.aspx
Name: source_streamflow, dtype: object


- In "observations" users may find the full description of how to download the records and the areas information.

In [6]:
print(catalogue_estreams[catalogue_estreams.country_code =="DK"].observations)

provider_id
DK_ODA    # For the streamflow records: 1. First you need to leave your e-mail address; 2. Click on “hent data”, which means get data. In the drop-down menu choose “Vandlob”. 3. Choose “Hydrometri” and then “dognvandforing” which means daily discharge. Then “ar” means year, and then you can choose an individual year or click on “valg alle” to choose every year. 4. To choose which station go to the tab of “observationsstednr” meaning observation location number. You can choose individual stations or the “valg alle”. #For catchment areas: 1. First you need to leave your e-mail address; 2. Click on "hent data" and choose ”Vandløb”; 3. Click on ”Oplandsbeskrivelse” and choose ”Oplandsareal” and then you can choose the stations in the long list and now you should get a file containing catchments areas. 
Name: observations, dtype: object


## Spain
- For ES, users can download all the records at once from each of its 10 regions directly from its website.
- Users can directly download only the AFLIQ.CSV, which stands for daily streamflow file. 

In [11]:
print(catalogue_estreams[catalogue_estreams.country_code =="ES"].source_streamflow)

provider_id
ES_CEDEX    https://ceh.cedex.es/anuarioaforos/demarcaciones.asp
Name: source_streamflow, dtype: object


## Finland
- For FI, users should first register and follow the steps to download the daily data. 
- As a registration is necessary, it is highly not recomended to try the implementation of any private API.
- The step-by-step is relativelly long, but users can find the detailed information in the field "observations".

In [12]:
print(catalogue_estreams[catalogue_estreams.country_code =="FI"].source_streamflow)

provider_id
FI_SYKE    https://wwwp2.ymparisto.fi/scripts/kirjaudu.asp
Name: source_streamflow, dtype: object


- In "observations" users may find the full description of how to download the records and the areas information.

In [66]:
print(catalogue_estreams[catalogue_estreams.country_code =="FI"].observations)

provider_id
FI_SYKE    1. First make an account using the link: https://www.syke.fi/fi-FI/Avoin_tieto/Ymparistotietojarjestelmat/Rekisteroityminen; 2. Once logged in: click on the "Ympäristötiedon hallintajärjestelmä Hertta" to get to the database; 3. Translate to English by first clicking on "Asetukset" and clickon “Kieli” and choose “ Englanti” for English. “Suomi” means finish. Click ok “Hyvasky” and close the pop-up ‘Sulje”. ;  4. Then you click “Hydrology > Hydrological observations > Data search” ; 5. The site has a maximum amount of data you can download in one go. So you need to choose an area and then search up all the stations in that area, download and then move on to the next area. One tip is to choose "county" in the tab "Area"; 6. Now you should seelct the discharge. Click on "Observation station", select "Parameter", then "Virtaama", then click on "Accept"; 7. After selecting the "Area" and the "Observation station", click on "Places on list" on the right side of the scr

## France
- For FR, the official website does not offer a straight forward way to retrieve many data at once.
- Here, we make a web-scrapping code available for users.
- This code is for demonstrations purposes, and we are not liable for any misuse of it by users.

In [6]:
print(catalogue_estreams[catalogue_estreams.country_code =="FR"].source_streamflow)

provider_id
FR_EAUFRANCE    https://www.hydro.eaufrance.fr
Name: source_streamflow, dtype: object


In [8]:
# Subset the network to France:
network_FR = network_estreams[network_estreams.gauge_country == "FR"]
network_FR.iloc[2:5, :]

Unnamed: 0_level_0,gauge_id,gauge_name,gauge_country,gauge_provider,river,lon_snap,lat_snap,lon,lat,elevation,...,num_days_gaps,num_continuous_days,num_days_reliable,num_days_noflag,num_days_suspect,gauge_flag,duplicated_suspect,watershed_group,gauges_upstream,nested_catchments
basin_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
FR000003,A022065001,A0220650,FR,FR_EAUFRANCE,Le ruisseau le Liesbach à Blotzheim,7.508168,47.587489,7.508168,47.587489,274,...,0.0,856,530.0,0.0,326.0,C,,1,1,['FR000003']
FR000004,A023010001,A0220570,FR,FR_EAUFRANCE,[Le Liesbach] à Saint-Louis [Langenhaeuser],7.545294,47.603575,7.545294,47.603575,248,...,327.0,193,169.0,0.0,224.0,E,,1,3,"['FR000002', 'FR000005']"
FR000005,A023020001,A0230201,FR,FR_EAUFRANCE,Le ruisseau l'Augraben à Saint-Louis [Michelfelden],7.556488,47.59698,7.556488,47.59698,247,...,0.0,216,216.0,0.0,0.0,A,,1,2,['FR000002']


In [9]:
gauge_ids = ["A021005050", "A022020001", "A022065001"]

for gauge_id in tqdm.tqdm(gauge_ids):
    data = download_data_FR(gauge_id)

100%|████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:45<00:00, 15.32s/it]


## Great Britain
- For UK, there is already an API provided by the official national provider.
- It is stated that users should not make use of their own API, but stick with the British one. 
- In the given link, users may have a full guidance about how to use the API.

In [30]:
print(catalogue_estreams[catalogue_estreams.country_code =="GB"].source_streamflow)

provider_id
GB_NRFA    https://nrfaapps.ceh.ac.uk/nrfa/nrfa-api.html
Name: source_streamflow, dtype: object


## Greece
- For GR, users have three data providers. The data can be downloaded directly from the official websites.
- So far, users should download data from each station manually and individually.

In [31]:
print(catalogue_estreams[catalogue_estreams.country_code =="GR"].source_streamflow)

provider_id
GR_GRDC                                                                                          https://www.bafg.de/GRDC/EN
GR_HCMR      https://hydro-stations.hcmr.gr/%cf%80%ce%b1%cf%81%ce%bf%cf%87%ce%ae-%cf%80%ce%bf%cf%84%ce%b1%ce%bc%cf%8e%ce%bd/
GR_OPENHI                                                                                             https://openhi.net/en/
Name: source_streamflow, dtype: object


## Croatia
- For Croatia (HR), the website does not offer a straight forward way to retrieve many data at once.
- In fact, there is no straight forward also to retrieve the gauges metadata.
- Hence, we made a web-scrapping code available for users.
- Users should check the number of stations avaialble in the website. As for 01.05.2024 the number was 397.
- Users should only run the download_data_HR after have retrieved the metadata since we need the range of years for each station.
- This code is for demonstrations purposes, and we are not liable for any misuse of it by users.

In [13]:
print(catalogue_estreams[catalogue_estreams.code_basins =="HR"].source_streamflow)

provider_id
HR_DHZ    https://hidro.dhz.hr/
Name: source_streamflow, dtype: object


In [14]:
metadata_HR = get_metadata_HR(num_stations=10)

100%|██████████████████████████████████████████████████████████████████████████████████| 10/10 [00:16<00:00,  1.66s/it]
100%|█████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 763.89it/s]


In [15]:
# Drop rows with NaN values in the specified column
metadata_HR = metadata_HR.dropna(subset=["years_streamflow"])
metadata_HR["num"] = metadata_HR.index+1
metadata_HR

Unnamed: 0,Ime,Šifra,Tip postaje,Vodotok,Sliv,Porječje,Početak rada,Kraj rada,Kota nule vodokaza (m n/m),Udaljenost od ušća (km),Udaljenost od izvora (km),Topografska površina sliva (km2),years_streamflow,num
4,MOMJAN,6157,Limnigrafska postaja,ARĐILA,JADRANSKI SLIV,Porječja sjevernog Jadrana,05. 09. 2001.,--,184478,3100,--,--,2005-2009\n2012-2021,5
5,ŠIPAK,7136,Automatska dojava,BAĆINSKA JEZERA,JADRANSKI SLIV,Porječja južnog Jadrana,01. 07. 1894.,--,236,--,--,--,1973-2022,6
6,KOSINJSKI BAKOVAC,8031,Vodokazna postaja,BAKOVAC,JADRANSKI SLIV,Porječja sjevernog Jadrana,21. 07. 1967.,--,494681,6300,5500,--,1981-2022,7
7,ŠPORČIĆ KLANAC,8056,Vodokazna postaja,BAKOVAC,JADRANSKI SLIV,Porječja sjevernog Jadrana,01. 09. 1957.,--,488426,5100,--,--,1980-2017\n2019-2020,8
8,KLJUČ,5143,Limnigrafska postaja,BEDNJA,CRNOMORSKI SLIV,Porječja Drave i Dunava,01. 01. 1986.,--,173090,--,--,415670,1987-2023,9
9,LEPOGLAVA,5140,Automatska dojava,BEDNJA,CRNOMORSKI SLIV,Porječja Drave i Dunava,01. 01. 1986.,--,219310,--,--,89800,1987-2023,10


In [16]:
# Download the data (it may take a good while):
download_data_HR(network_HR = metadata_HR, PATH_EXP = "data/streamflow/raw_data/HR")

6it [12:02, 120.49s/it]


## Hungary
- For HU, users have two data providers. The data can be downloaded directly from GRDC, and from its national provider, a request must be placed.

In [32]:
print(catalogue_estreams[catalogue_estreams.country_code =="HU"].source_streamflow)

provider_id
HU_CONTACTFORM    https://ovf.hu/kozerdeku/adatigenyles
HU_GRDC                     https://www.bafg.de/GRDC/EN
Name: source_streamflow, dtype: object


## Ireland
- For IE, users have two data providers. The data can be downloaded directly from OPW (manually and individually so far).
- For EPA, we provide a web-scrapping API to help the data retriavel.
- The gauges metadata is easily accessible manually.
- This code is for demonstrations purposes, and we are not liable for any misuse of it by users.

In [33]:
print(catalogue_estreams[catalogue_estreams.country_code =="IE"].source_streamflow)

provider_id
IE_EPA    https://epawebapp.epa.ie/hydronet/#Flow
IE_OPW           https://waterlevel.ie/hydro-data
Name: source_streamflow, dtype: object


- Here there is one demonstration of the provided API for Ireland EPA:

In [11]:
# Select one gauge from the dataset:
gauge_id = network_estreams[network_estreams.gauge_provider == "IE_EPA"].iloc[0, 0]
print(gauge_id)

18118


In [10]:
# Call the function and download the time series data (the downloaded series is stored in the default downloads folder)
download_data_IEEPA(gauge_id)

Successfully downloaded data for station 18118


## Iceland
- For IS, users have the option to download the data from LamaH-Ice, which provide all records in a concise and already organized way. 

In [34]:
print(catalogue_estreams[catalogue_estreams.country_code =="IS"].source_streamflow)

provider_id
IS_LAMAHICE    http://www.hydroshare.org/resource/86117a5f36cc4b7c90a5d54e18161c91
Name: source_streamflow, dtype: object


## Italy
- For Italy, we have 14 data providers. Each presents its own way to facilitate data retriavel. 
- A request must be placed via request form for: IT_ABR_CONTACTFORM, IT_LIG_CONTACTFORM, IT_LOM_CONTACTFORM and IT_VEN.
- A straight forward website which enables the individual download is avaialble for: IT_EMI, IT_GRDC, IT_LOM, IT_PIE, IT_SAR, IT_TOS, IT_TRE, IT_UMB and IT_VAL.
- For IT_ISPRA and IT_TOS, we made a web-scrapping code available for users to retrieve both the records and gauges metadata.

In [55]:
print(catalogue_estreams[catalogue_estreams.country_code =="IT"].source_streamflow)

provider_id
IT_ABR_CONTACTFORM                                                       https://www.regione.abruzzo.it/content/annali-idrologici
IT_EMI                                                                                              https://simc.arpae.it/dext3r/
IT_GRDC                                                                                               https://www.bafg.de/GRDC/EN
IT_ISPRA                                         http://www.hiscentral.isprambiente.gov.it/hiscentral/hydromap.aspx?map=obsclient
IT_LIG_CONTACTFORM                                                               https://www.arpal.liguria.it/arpal/contatti.html
IT_LOM                                                                     https://idro.arpalombardia.it/manual/dati_storici.html
IT_LOM_CONTACTFORM                                                                   https://idro.arpalombardia.it/it/map/sidro/#
IT_PIE                https://www.arpa.piemonte.it/rischi_naturali/snippets_ar

- Here there is the demonstration of the provided API for ISPRA.
- This code is for demonstrations only, and we are not liable for any misuse of it by users.

In [9]:
# Select one gauge from the dataset:
gauge_id = network_estreams[network_estreams.gauge_provider == "IT_ISPRA"].iloc[0, 0]
print(gauge_id)

hsl-abr:5010


In [10]:
# Download the data records. The users should specify where to store the downloaded data:
download_data_ITIS(gauge_id, PATH_EXP = "data/streamflow/raw_data/IT/ISPRA")

In [7]:
# Download the station metadata:
download_metadata_ITIS(gauge_id, PATH_EXP = "data/streamflow/raw_data/IT/ISPRA")

- Here there is the demonstration of how to automatize the download for Toscana.
- This code is for demonstrations purposes, and we are not liable for any misuse of it by users.

In [21]:
# First we can save the list of gauges from Toscana in a list
gauges_toscana = network_estreams[network_estreams.gauge_provider == "IT_TOS"].gauge_id.tolist()

# For the loop we just need to change the last part of the URL (CODE):
URL = "https://www.sir.toscana.it/archivio/download.php?IDST=idro_p&IDS=CODE"

# Create a directory to save the downloaded files (in case it does not already exist)
os.makedirs("data/streamflow/raw_data/IT/TOS", exist_ok=True)

# Now we make a loop over the list of gauges, downloading them
for gauge in gauges_toscana:
    
    url = URL.replace("CODE", str(gauge))

    # Send an HTTP GET request to the URL to fetch the data
    response = requests.get(url)

    # Check if the request was successful
    if response.status_code == 200:
        # Create the file name where the downloaded data will be stored
        file_name = f"data/streamflow/raw_data/IT/TOS/{gauge}.csv"
            
        # Save the content to a file
        with open(file_name, 'wb') as file:
            file.write(response.content)
                
        # Print a success message with the file name
        print(f"Downloaded: {file_name}")
    else:
        # Print a failure message with the URL and status code
        print(f"Failed to download: {url} with status code {response.status_code}")

Downloaded: data/streamflow/raw_data/IT/TOS/TOS01004379.csv


## Luxembourg (LU)
- For LU, users should place an official request for the data.

In [36]:
print(catalogue_estreams[catalogue_estreams.country_code =="LU"].source_streamflow)

provider_id
LU_CONTACTFORM    https://map.geoportail.lu/theme/eau?version=3&zoom=11&X=711893&Y=6404363&lang=en&rotation=0&layers=655-749&opacities=1-1&bgLayer=topo_bw_jpeg&time=--&crosshair=false
Name: source_streamflow, dtype: object


## Northern Ireland
- For UK, there is already an API provided by the official national provider.
- It is stated that users should not make use of their own API, but stick with the British one. 
- In the given link, users may have a full guidance about how to use the API.

In [37]:
print(catalogue_estreams[catalogue_estreams.country_code =="NI"].source_streamflow)

provider_id
NI_NRFA    https://nrfaapps.ceh.ac.uk/nrfa/nrfa-api.html
Name: source_streamflow, dtype: object


## Netherlands
- For NL, records can be downloaded directly from the official website.
- So far, users should download data from each station manually and individually.

In [38]:
print(catalogue_estreams[catalogue_estreams.country_code =="NL"].source_streamflow)

provider_id
NL_RWS    https://waterinfo.rws.nl/#/publiek/waterafvoer
Name: source_streamflow, dtype: object


## Norway
- For NO, records can be downloaded directly from the official website or using an official API.
- With the website, users should download data from each station manually and individually.

In [6]:
print(catalogue_estreams[catalogue_estreams.country_code =="NO"].source_streamflow)

provider_id
NO_NVE    https://seriekart.nve.no
Name: source_streamflow, dtype: object


- If users would like to download the data in an automatized way, they should refer to the API:

In [7]:
print(catalogue_estreams[catalogue_estreams.country_code =="NO"].observations)

provider_id
NO_NVE    The user can either use the provided website for download the data individually, or the official API (https://hydapi.nve.no/UserDocumentation/) to download data from many stations at once. 
Name: observations, dtype: object


## Poland
- For PL, records can be downloaded directly from the official website.
- So far, users should download daily data from each month and each year individually. 
- The datasets are download as one single file (CSV) per month and year with all the avaialble gauges concatenated. 

In [23]:
print(catalogue_estreams[catalogue_estreams.country_code =="PL"].source_streamflow)

provider_id
PL_IMGW    https://danepubliczne.imgw.pl/introduction
Name: source_streamflow, dtype: object


- Users can adapt the following URL to automatize the download process as follows:
    - Replace the {YEAR} and {MONTH} with the desired date:
    - https://danepubliczne.imgw.pl/data/dane_pomiarowo_obserwacyjne/dane_hydrologiczne/dobowe/{YEAR}/codz_{YEAR}_{MONTH}.zip
    - For example, to download data from march 2020 users should adapt the URL as:
    - https://danepubliczne.imgw.pl/data/dane_pomiarowo_obserwacyjne/dane_hydrologiczne/dobowe/2020/codz_2020_03.zip

- Here there is the demonstration for downloading Polish data.
- This code is for demonstrations purposes, and we are not liable for any misuse of it by users.

In [20]:
# Define the URL pattern for fetching data
URL = "https://danepubliczne.imgw.pl/data/dane_pomiarowo_obserwacyjne/dane_hydrologiczne/dobowe/YEAR/codz_YEAR_MONTH.zip"

# Create a directory to save the downloaded files (in case it does not already exist)
os.makedirs("data/streamflow/raw_data/PL", exist_ok=True)

# Define the range of years to download data for
start_year = 1951
end_year = 2021

# Loop through each year in the specified range
for YEAR in range(start_year, end_year + 1):
    # Loop through each month from January (1) to December (12)
    for MONTH in range(1, 13):
        # Replace YEAR and MONTH in the URL with the correct values
        url = URL.replace("YEAR", str(YEAR)).replace("MONTH", f"{MONTH:02d}")
        
        # Send an HTTP GET request to the URL to fetch the data
        response = requests.get(url)
        
        # Check if the request was successful
        if response.status_code == 200:
            # Create the file name where the downloaded data will be stored
            file_name = f"data/streamflow/raw_data/PL/codz_{YEAR}_{MONTH:02d}.zip"
            
            # Save the content to a file
            with open(file_name, 'wb') as file:
                file.write(response.content)
                
            # Print a success message with the file name
            print(f"Downloaded: {file_name}")
        else:
            # Print a failure message with the URL and status code
            print(f"Failed to download: {url} with status code {response.status_code}")

Downloaded: data/streamflow/raw_data/PL/codz_1951_01.zip
Downloaded: data/streamflow/raw_data/PL/codz_1951_02.zip
Downloaded: data/streamflow/raw_data/PL/codz_1951_03.zip
Downloaded: data/streamflow/raw_data/PL/codz_1951_04.zip
Downloaded: data/streamflow/raw_data/PL/codz_1951_05.zip
Downloaded: data/streamflow/raw_data/PL/codz_1951_06.zip
Downloaded: data/streamflow/raw_data/PL/codz_1951_07.zip
Downloaded: data/streamflow/raw_data/PL/codz_1951_08.zip
Downloaded: data/streamflow/raw_data/PL/codz_1951_09.zip
Downloaded: data/streamflow/raw_data/PL/codz_1951_10.zip
Downloaded: data/streamflow/raw_data/PL/codz_1951_11.zip
Downloaded: data/streamflow/raw_data/PL/codz_1951_12.zip


## Portugal
- For PT, records can be downloaded directly from the official website.
- So far, users should download data from each station manually in groups of about 10 stations.
- Below users may find some extra information about the procedure (translation to English).

In [44]:
print(catalogue_estreams[catalogue_estreams.country_code =="PT"].observations)

provider_id
PT_SNIRH    One should click: "Dados de Base" > "Monitorizacao" > "Redes" > "Hidrometrica" > "Aplicar filtros". Then one should select the stations, then click in "Selecionar Estacoes" > "Validar Lista". Then one should select "Nivel Medio Diario" (= Mean daily streamflow). Then click in "Guardar Dados" and download the data in CSV-file. Notice that one can currently select up to 50 stations to download at a time. For stations metadata one can simply click in "Caracteristicas das Estacoes" in the lower right and download the CSV-file. 
Name: observations, dtype: object


In [42]:
print(catalogue_estreams[catalogue_estreams.country_code =="PT"].source_streamflow)

provider_id
PT_SNIRH    https://snirh.apambiente.pt/index.php?idMain=2&idItem=1
Name: source_streamflow, dtype: object


## Sweden
- For SE, records can be downloaded directly from the official website.
- So far, users should download data from each station individually and manually.

In [46]:
print(catalogue_estreams[catalogue_estreams.country_code =="SE"].source_streamflow)

provider_id
SE_SMHI    https://www.smhi.se/data/hydrologi/ladda-ner-hydrologiska-observationer#param=waterdischargeDaily,stations=core
Name: source_streamflow, dtype: object


## Slovenia
- For SI, records can be downloaded directly from the official website.
- So far, users should download data from each station individually and manually.

In [45]:
print(catalogue_estreams[catalogue_estreams.country_code =="SI"].source_streamflow)

provider_id
SI_ARSO    https://vode.arso.gov.si/hidarhiv/
Name: source_streamflow, dtype: object


## GRDC
- For GRDC, the website offers an way to download the data for many stations at once.
- In our database, we include records from the following countries: Bulgaria (BG), Belarus (BY), Cyprus (CY), Czechia (CZ), Estonia (EE), Greece (GR), Italy (IT), Lithuania (LT), Latvia (LV), Moldova (MD), North Macedonia (MK), Romania (RO), Russia (RU), Slovakia (SK), Turkey (TR), Ukraine (UA)

In [47]:
print(catalogue_estreams[catalogue_estreams.country_code =="BY"].source_streamflow)

provider_id
BY_GRDC    https://www.bafg.de/GRDC/EN
Name: source_streamflow, dtype: object


# End