# Flashflood worfklow - Preprocessing
The data provided for the visualization workflow of flashfloods has been preprocessed by CRAHI. The steps followed to compute the necessary indicators that afterwards are used for the risk assessment are describe in this workflow.

## Hazard assessment methodology <a id="methodology"></a>
High intensity rainfall is ...(breif description).

The goal is study the evolution and behaviour of annual maxima precipitation for different durations in order to provide the expected intensities for different return periods. (Explain fitting by citing some papers).

## Preparation work
In this notebook we will use some useful Python libraries:
* [cdsapi](https://cds.climate.copernicus.eu/api-how-to) - A library to request data from the datasets listed in the CDS catalogue.
* [xarray](https://docs.xarray.dev/en/stable/) -
* [numpy](https://numpy.org/doc/stable/) - A powerful library for numerical computations in Python, widely used for array operations and mathematical functions.
* [scipy](https://docs.scipy.org/doc/scipy/) - Provides algorithms for optimization, statistics and many other classes of problems.
* [matplotlib](https://matplotlib.org/stable/) -  A versatile plotting library in Python, commonly used for creating static, animated, and interactive visualizations.

### Load libraries

In [1]:
# Libraries to download data and manage files
import os
import cdsapi
import zipfile

# Libraries for numerical computations and array manipulation
import numpy as np
import xarray as xr

# Libraries to handle geospatial data
import pyproj

# Libraries to plot maps, charts and tables
import matplotlib.pyplot as plt
from matplotlib.colors import from_levels_and_colors
import cartopy.crs as ccrs
import cartopy.feature as cfeature
from cartopy.util import add_cyclic_point
import plotly.express as px
import plotly.graph_objs as go

### Setting your directory

In [5]:
# Define the directory for the flashflood workflow preprocess
workflow_dir = '/usuaris/gonzalez/tmp/flashflood_workflow'

# Define directories for data and results within the previously defined workflow directory
data_dir = os.path.join(workflow_dir, 'data')
results_dir = os.path.join(workflow_dir, 'results')

# Check if the workflow directory exists, if not, create it along with subdirectories for data and results
if not os.path.exists(workflow_dir):
    os.makedirs(workflow_dir)
    os.makedirs(data_dir)
    os.makedirs(results_dir)


### Download data
The data used to compute the indices for this workflow is available in the [Climate Data Store](https://cds.climate.copernicus.eu/#!/home) (CDS) portal. It is possible to download the data using its API. Learn how to use it [here](https://cds.climate.copernicus.eu/api-how-to).

For this example, we will download 30 years of CORDEX precipitation data in a 3h temporal resolution for a particular pair of GCM and RCM and for a specific RCP.

In [None]:
# Change the KEY to your own
URL = "https://cds.climate.copernicus.eu/api/v2"
KEY = "259051:a39ea35e-91d8-43d6-954d-374481b1a550"

# Define zip file's absolute path
zip_path = os.path.join(data_dir, 'cordex_pr_3h_2041_2070_rcp85.zip')

c = cdsapi.Client(url=URL, key=KEY)
c.retrieve(
        'projections-cordex-domains-single-levels',
        {
            'format': 'zip',
            'domain': 'europe',
            'experiment': 'rcp_8_5',
            'horizontal_resolution': '0_11_degree_x_0_11_degree',
            'temporal_resolution': '3_hours',
            'variable': 'mean_precipitation_flux',
            'gcm_model': 'ichec_ec_earth',
            'rcm_model': 'knmi_racmo22e',
            'ensemble_member': 'r1i1p1',
            'start_year': ['2041', '2042', '2043', '2044', '2045', '2046',
                           '2047', '2048', '2049', '2050', '2051', '2052',
                           '2053', '2054', '2055', '2056', '2057', '2058',
                           '2059', '2060', '2061', '2062', '2063', '2064',
                           '2065', '2066', '2067', '2068', '2069', '2070'],
            'end_year': ['2042', '2043', '2044', '2045', '2046',
                           '2047', '2048', '2049', '2050', '2051', '2052',
                           '2053', '2054', '2055', '2056', '2057', '2058',
                           '2059', '2060', '2061', '2062', '2063', '2064',
                           '2065', '2066', '2067', '2068', '2069', '2070', '2071'],
        },
        zip_path)

2024-01-08 11:27:33,116 INFO Welcome to the CDS
2024-01-08 11:27:33,118 INFO Sending request to https://cds.climate.copernicus.eu/api/v2/resources/projections-cordex-domains-single-levels
2024-01-08 11:27:33,371 INFO Request is queued
2024-01-08 11:27:34,419 INFO Request is running


Extract the downloaded file to your data directory.

In [None]:
with zipfile.ZipFile(zip_path, 'r') as zObject:
    zObject.extractall(path=data_dir)

## Explore the data
The downloaded files from CDS have a certain filename structure to describe the exact dataset each one of them contains:

<p style="text-align: center;">variable_domain_gcm_rcp_ensemble_rcm_version_temporal resolution_ start day_end day</p>

In this particular case, we would have the following file:
<p style="text-align: center;">pr_EUR-11_ICHEC-EC-EARTH_rcp85_r1i1p1_KNMI-RACMO22E_v1_3hr_2041010100-2042010100.nc</p>

Load one of the files and explore the content and structure of the dataset. Notice the dimensions, coordinates,  data variables, indexes and attributes as well as the description of the spatial reference system in _rotated_pole()_.

In [3]:
# Define the absolute path for a specific file
# filename_precipitation = os.path.join(data_dir,
  #                      'pr_EUR-11_ICHEC-EC-EARTH_rcp85_r1i1p1_KNMI-RACMO22E_v1_3hr_2041010100-2042010100.nc')

# local path so that it works (delete)
filename_precipitation_example = os.path.join('/var/dades/research1/gabarro/climaax/flashflood_workflow/data/cordex_ICHEC-EC-EARTH',
                        'pr_EUR-11_ICHEC-EC-EARTH_rcp85_r1i1p1_KNMI-RACMO22E_v1_3hr_2041010100-2042010100.nc')

# Open the netCDF file using xarray as a dataset
dataset_precipitation_example = xr.open_dataset(filename_precipitation_example, decode_coords = 'all')

# Display said dataset
dataset_precipitation_example

## Process the data
As explained in the [methodology](#methodology) chapter, the goal of this workflow is to study the changes of extreme rainfall events. To do so, the following steps have been followed:
1. Extract the annual maximum rainfall for specific durations and save the new temporal series.
2. Fit the 30-year series of maximum annual rainfall into a probability distribution.
3. Compute the expected intensities for each duration and return period.

:::{attention}
Remember that this is a simplified version of how the provided files for the visualization workflow have been computed and assembled. For each step, both code and results are shown for a specific area in order to minimize the execution time.
:::

### Step 1. Temporal series of annual maximum rainfall
As the CORDEX data for precipitation is available in a 3 hour temporal step, it is possible to compute the annual maximum for 3h, 6h, 12h and other 3h-multiple durations.
Some important functions from the ```xarray``` library are used:
* ```sel``` to select certain indexes of the dataset/dataarray by value.
* ```open_mfdataset``` to open netCDF files.
* ```rolling``` to roll over a dataset/dataarray within a specific dimension.
* ```to_netcdf``` to save dataset/dataaray as netcdf.

:::{warning}
This could take some time due to the large dataset the script is handling.
:::

In [39]:
# Auxiliary function to slice each dataset to a particular region (CATALONIA).
def cut_to_region(ds):
    ds = ds.sel(rlat = slice(-9,-7), rlon = slice(-13,-11))
    return ds
    
# Open the 30 files as a single dataset using a preprocess function.
#dataset_precipitation = xr.open_mfdataset(f'{data_dir}/*.nc', decode_coords='all', preprocess=slice_to_region)
datadir = '/var/dades/research1/gabarro/climaax/flashflood_workflow/data/test_notebook/'
dataset_prec = xr.open_mfdataset(f'{datadir}*.nc', decode_coords='all', preprocess=cut_to_region)

# Units of pr variable are kg m-2 s-1. Need to convert to mm.
dataarray_prec = dataset_prec['pr']*3600*3
# Assign the new units to the variable
dataarray_prec.attrs['units'] = 'mm'

# For every duration (3h, 6h, 12h), compute the annual maximum and save as a new netCDF file.
for duration in [3, 6, 12]:
    # longitude of window to roll the DataSet
    window = int(duration/3)
    # Create new DataArray with annual maxima
    dataarray_prec_aux = dataarray_prec.rolling(time=window).sum().resample(time = 'YS').max(keep_attrs=True)
    # Assign new 'duration' dimension to save this information and rename variable
    dataarray_prec_aux = dataarray_prec_aux.expand_dims(dim = {'duration': [duration]}, axis = 0).rename('pr_max')
    # Write reference system (CRS) into the DataArray
    # dataarray_prec_aux.rio.write_crs()
    # Define name of new netCDF file
    filename_aux = f'pr_annual_maximum_{duration}h_2041_2070.nc'
    # Save to your results' directory as a netCDF
    dataarray_prec_aux.to_netcdf(os.path.join(datadir, filename_aux))
    

KeyboardInterrupt: 

Let us have a look at one of the files just created. Picking a pair of lat/lon coordinates, we can plot the evolution within 30 years of the annual maximum precipitation for a 3h window.

In [50]:
# Open netCDF with xarray (now we just have one file)
dataset_pr_max = xr.open_dataset(os.path.join(datadir, 'pr_annual_maximum_3h_2041_2070.nc'),
                                 decode_coords='all')

# Plot the temporal series from 2040 to 2070.
dataset_pr_max['pr_max'][0,:,0,0].plot()

[<matplotlib.lines.Line2D at 0x7efbe3a660d0>]

### Step 2. Probability distribution fitting of annual maxima series

In [None]:
# Code here for fitting.

### Step 3. Expected intensities for different return periods

In [1]:
# Code here for intensity value and confidence interval.