# 0. Import packages

First, we import the necessary packages for this exercise.

In [2]:
import xarray as xr 
import matplotlib.pyplot as plt 
import cartopy.crs as ccrs
from datetime import datetime, timedelta
import numpy as np
import cartopy.feature as cfeature

# 1. Open simulation data

We have simulated this case for this exercise many times in many different configurations. All the output data can be found in the 'data' folder. The data is stored in netCDF-files, which can be easily read as xarray Datasets. Each of the netCDF-files has a structured name `<configuration>_<starttime>_<nhours>_<variable>_<tstart>_<tstop>_<tstep>_regridded.nc` where the different elements mean:
1. `<configuration>`: This refers to the set-up of the model. For this exercise, we will only use the data from the 'baseline'-simulation.
2. `<starttime>`: This refers to the start time of the simulation. By varying the start time, we can create an ensemble of different simulations of the same event (see below). The start time is written in the format of `<year><month><day><hour>`.
3. `<nhours>`: This refers to the number of hours that the simulation has run.
4. `<variable>`: This refers to the variable saved in this file. We use the standard abbreviations from the Climate and Forecast (CF) Metadata Conventions. The ones you will need in these exercises are:
    - *pr*: Precipitation
    - *tas*: Near-Surface Air Temperature
    - *tasmax*: Maximum Near-Surface Air Temperature
    - *tasmin*: Minimum Near-Surface Air Temperature
5. `<tstart>`: This refers to the first time step of the data in the file. This is written in the format of `<year>-<month>-<day>T<hour>`.
6. `<tstart>`: This refers to the last time step of the data in the file. This is written in the format of `<year>-<month>-<day>T<hour>`.
7. `<tstep>`: This refers to the time between subsequent time steps in the data, expressed in seconds. You will see that this is 3600 for all files (so hourly data).
8. `regridded`: This refers to the fact that the data has been regridded from the orginal coordinate reference system (CRS) to a latitude-longitude reference system. This makes the data easier to work with.
9. `.nc`: This extension denotes the fact that this is a netCDF-file.

Let's start by opening a file and loading in the data! Load in the precipitation data corresponding to the baseline simulation which started at midnight of 1 July 2021. You can use the xarray function `open_dataset` ([documentation](https://docs.xarray.dev/en/stable/generated/xarray.open_dataset.html)) **open_mfdataset??** Save this dataset as `ds` and inspect the contents by printing the output or simply running a cell which ends with `ds`. The following questions/tasks will help you along:
- What are the dimensions of the data?
- First consider the spatial dimensions. What are the units? What is the resolution? Convert this to (kilo)metres. Use the fact that the radius of the Earth is 6371 km.
- Next, consider the attributes of the variable `pr`. What are the units? Usually, precipitation flux is expressed in mm/h. Convert these units (if necessary) to mm/h. Use the fact that water has a density of 1000 $kg/m^3$ .
- Lastly, in the attributes of the variable `pr`, you can find `cell_methods = time: mean`. This signifies that the variable is not instantaneous value at a certain moment in time, but provides to average value over a certain time interval. The bounds of this interval are given in the coordinate `time_bnds`. So, every value of the coordinate `time` has two corresponding values in the coordinate `time_bnds`: the start and end time of the interval over which the average is taken. Compare the values in `time_bnds` to those in `time`. For a `time` value at moment $x$, which interval corresponds to this? What is the advantage of providing the average precipitation flux, instead of the instantaneous value?

In [3]:
## Write your code here

## Solutions:
- Dimensions are (time, latitude, longitude).
- The units are degrees. The resolution is 0.07° for longitude and 0.05° for latitude. We can convert this to kilometres at a certain latitude $\phi$, with $R$ the radius of the Earth:
    - $\Delta x = \left( \frac{\Delta \text{lon}}{360\degree} \right)$ * (circumference of circle of latitude) = $\left(\frac{\Delta \text{lon}}{360\degree}\right) (2 \pi R \cos \phi)$
    - $\Delta y = \left( \frac{\Delta \text{lat}}{360\degree} \right)$ * (circumference of circle of longitude) = $\left(\frac{\Delta \text{lat}}{360\degree}\right) (2 \pi R)$

    We can see that while $\Delta x$ depends on the latitude, $\Delta y$ does not. Filling in the numbers and using an average latitude of $50.5\degree$ we find: $\Delta x = 4.95$ km and $\Delta y = 5.56$ km
- We convert $\text{kg} / (\text{m}^2 \cdot \text{s})$ to $\text{mm}/\text{h}$:
    $$1 \frac{\text{kg}}{\text{m}^2 \cdot \text{s}} = \frac{10^{-3} \text{m}^3}{\text{m}^2 \cdot \frac{1}{3600} \text{h}} = 3600 \frac{10^{-3} \text{m}^3}{\text{m}^2 \text{h}} = 3600 \frac{10^{-3} \text{m}}{ \text{h}} = 3600 \frac{\text{mm}}{ \text{h}}$$
    So a precipitation flux expressed in $\text{kg} / (\text{m}^2 \cdot \text{s})$ needs to be multiplied by 3600 to convert it to $\text{mm}/\text{h}$.
- The value at time $x$ corresponds to the average of the interval $[x - 1 h, x]$. Providing the hourly average precipitation flux enables us to exactly calculate the total amount of precipitation, which would not be possible if only the hourly instantaneous flux was provided.

In [4]:
ds = xr.open_mfdataset("../data/baseline_20210701*pr*", engine="netcdf4")
ds

Unnamed: 0,Array,Chunk
Bytes,6.00 kiB,16 B
Shape,"(384, 2)","(1, 2)"
Dask graph,384 chunks in 2 graph layers,384 chunks in 2 graph layers
Data type,datetime64[ns] numpy.ndarray,datetime64[ns] numpy.ndarray
"Array Chunk Bytes 6.00 kiB 16 B Shape (384, 2) (1, 2) Dask graph 384 chunks in 2 graph layers Data type datetime64[ns] numpy.ndarray",2  384,

Unnamed: 0,Array,Chunk
Bytes,6.00 kiB,16 B
Shape,"(384, 2)","(1, 2)"
Dask graph,384 chunks in 2 graph layers,384 chunks in 2 graph layers
Data type,datetime64[ns] numpy.ndarray,datetime64[ns] numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,7.69 MiB,20.51 kiB
Shape,"(384, 70, 75)","(1, 70, 75)"
Dask graph,384 chunks in 2 graph layers,384 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 7.69 MiB 20.51 kiB Shape (384, 70, 75) (1, 70, 75) Dask graph 384 chunks in 2 graph layers Data type float32 numpy.ndarray",75  70  384,

Unnamed: 0,Array,Chunk
Bytes,7.69 MiB,20.51 kiB
Shape,"(384, 70, 75)","(1, 70, 75)"
Dask graph,384 chunks in 2 graph layers,384 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray


In [5]:
dlon = 0.07
dlat = 0.05
R = 6371
lat = 50.5

dx = (dlon / 360) * (2 * np.pi * R * np.cos(lat * np.pi / 180))
dy = (dlat / 360) * (2 * np.pi * R)

print(f"dx = {dx:.2f} km ; dx = {dy:.2f} km")

dx = 4.95 km ; dx = 5.56 km


In [6]:
def convert_to_mm_h(ds):
    if "pr" in ds:
        ds.pr.values = ds.pr.values * 3600
        ds.pr.attrs["units"] = "mm/h"

    return ds

ds = convert_to_mm_h(ds)
ds

Unnamed: 0,Array,Chunk
Bytes,6.00 kiB,16 B
Shape,"(384, 2)","(1, 2)"
Dask graph,384 chunks in 2 graph layers,384 chunks in 2 graph layers
Data type,datetime64[ns] numpy.ndarray,datetime64[ns] numpy.ndarray
"Array Chunk Bytes 6.00 kiB 16 B Shape (384, 2) (1, 2) Dask graph 384 chunks in 2 graph layers Data type datetime64[ns] numpy.ndarray",2  384,

Unnamed: 0,Array,Chunk
Bytes,6.00 kiB,16 B
Shape,"(384, 2)","(1, 2)"
Dask graph,384 chunks in 2 graph layers,384 chunks in 2 graph layers
Data type,datetime64[ns] numpy.ndarray,datetime64[ns] numpy.ndarray


In [7]:
time = ds.time[0]
time_bnds = ds.time_bnds[0]

print(f"time : {time.values}")
print(f"time bounds : {time_bnds.values}")

time : 2021-07-01T01:00:00.000000000
time bounds : ['2021-07-01T01:00:00.000000000' '2021-07-01T01:00:00.000000000']


# 2. Plot the accumulated precipitation (simulation)

- Plot the accumulated precipitation between 8h on 13/07/2021 and 8h on 15/07/2021.
- Use the skeleton code from below!
- Aspect ratio?

## Solutions: