## Water balance model

This notebook explains how to create a water balance model for a farm dam (a tiny lake) in Australia. We continue with the data from the previous session and complete the data set that will become the model input. The figure below is a schematic representation of our water balance model. There are three fluxes: precipitation, evaporation and infiltration. The infiltration flux represents the flow of water from the lake to the groundwater. Inflow and outflow through the dam inlet and outlet are zero for the selected time period (no surface water flux).

<p align="center">
<img src="water_balance.png" alt="drawing" width="600">
</p>

In [None]:
# import the required packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

We start by importing the incomplete data and observing the contents of the file by using the `head` function of the DataFrame, which returns the first five rows of data by default.

In [None]:
# read the data from excel using pandas
dfd = pd.read_excel(
    'data/water_balance_data_incomplete.xlsx',
    index_col=0,
    parse_dates=True,
)
dfd.head()

Last week's homework exercise was to calculate the vapour pressure deficit and add it as a column named 'vpd' to the DataFrame. The code cell below demonstrates how to do this (see the notebook from session 3 for the equation to calculate the vpd).

In [None]:
e_s = 0.6108 * np.exp(17.27 * dfd["temperature"] / (dfd["temperature"] + 237.3))
e_a = dfd["rh"] / 100 * e_s
dfd["vpd"] = e_s - e_a
dfd.head()

By loading the coefficients of the polynomials that calculate dam area and volume based on the water level (saved previously in a ASCII text file), we can convert the measured water levels to water surface area and volume, and add these as new columns to the DataFrame.

In [None]:
p_func_V = np.poly1d(np.loadtxt("data/p_coef_V_linear.dat"))
p_func_A = np.poly1d(np.loadtxt("data/p_coef_A_linear.dat"))

dfd["volume"] = p_func_V(dfd["wl"])
dfd["area"] = p_func_A(dfd["wl"])
dfd.head()

We complete the data set by also loading the observations of the chloride concentrations and isotope delta values that were measured in water samples taken from the farm dam. Since the measurements were not taken on a daily basis, there are a lot of nan's in the data set, but that is not a problem for Pandas.

In [None]:
dfobs = pd.read_excel(
    "data/cl&isotope_observations.xlsx",
    index_col=0,
    parse_dates=True,
)
dfobs

The data can be added to the `dfd` DataFrame by using the `join` method. This method has several options for controlling how the DataFrames should be joined, but since `dfd` and `dfobs` have the same index, there is no need to use any of the keyword arguments. We use the `to_excel` method to save the DataFrame for later use.

In [None]:
df = dfd.join(other=dfobs)

df.to_excel("data/water_balance_data.xlsx")

### Precipitation

For the water and mass balance calculations, all water balance terms must be in m $^3$/d. We start with calculating the volumetric flow rate of rainfall which is

$ P = A * P_{mm} / 1000$

in which $P$ is the amount of rainfall added to the farm dam in m $^3$ /d, $P_{mm}$ the recorded daily rainfall in mm and $A$ is the water surface area m $^2$. Let's first plot the required data.

In [None]:
df[['rain', 'area']].plot(secondary_y='area', figsize=(8,2))

To get the volume of rainwater that lands on the water surface we can easily implement the formula above, replacing the symbols by the column values in `df`

In [None]:
df['P'] = df['area'] * df['rain'] / 1000.
df.head()

Now that we have the daily volumetric rainfall amount as a column in the DataFrame we can easily plot it.

In [None]:
df["P"].plot(
    kind="bar",
    figsize=(10, 4),
    ylabel="m3/day",
    title="Rainfall",
)

### Evaporation
Now we calculate the evaporation contribution to the water balance. The evaporation measurements in the Excel represent daily values measured using a Class A evaporation pan. The conversion to a volumetric evaporation rate in m $^3$/d is

$ E = A * E_{mm} / 1.2 / 1000 $

in which $E$ is the evaporated amount of water in m $^3$ /d, $E_{mm}$ the recorded daily pan evaporation in mm. From the equation it can be seen that the pan factor (which converts the pan evaporation to open water evaporation) was assumed to be 1.2.

In [None]:
pan_factor = 1.2
df['E'] = df['area'] * df['evaporation'] / pan_factor / 1000.
df['E'].head()

Let's plot the data

In [None]:
df['E'].plot(
    kind='bar',
    figsize=(10,4), 
    ylabel='m3/day', 
    title='Evaporation',
);

### Infiltration

The infiltration was not measured directly. Therefore we use the precipitation and the evaporation together with the daily volume changes of the farm dam to calculate the infiltration as

$ I = P - E - dV$

in which $I$ is the infiltration rate in m $^3$/d. $dV$ is the change in water volume between two consecutive days. Note that $P$ and $E$ are both positive numbers but $dV$ is negative when the water volume decreases from one day to the next. $I$ is also a positive number when the direction of flow is from  the farm dam to the groundwater.

***Exercise***: We can use the `diff` method to calculate the change in volume. Run the code cell below and inspect the difference between the resulting Series. Decide which code is appropriate for the calculations.

In [None]:
dV0 = df['volume'].diff()
dV1 = df['volume'].diff(periods=-1)
dV2 = -df['volume'].diff(periods=-1)
# Plot the first three rows of each resulting DataFrame
print(dV0.head(3), dV1.head(3), dV2.head(3))

In [None]:
df["dV"] = 

In [None]:
df['I'] = df['P'] - df['E'] - df['dV']

Now we have all the water balance components we can plot the results.

In [None]:
df[['P', 'E', 'I', 'dV']].plot(
    figsize=(10, 4), 
    grid=True,
);

## Mass Balance

Using the code below we calculate the chloride mass balance. We assume a starting chloride concentration for the farm dam water and a chloride concentration of the rain. 

We use a for-loop because the first time step is different than the following, plus the calculation of the concentration depends on the concentration in the previous time step. When a for loop is required for the calculations, it is highly recommended to conver the DataFrames to NumPy arrays because this will make the calculations a lot faster (see session_04_code_optimisation.ipynb).

Note that evaporation doesn't enter the equations for the chloride mass balance because the chloride concentration of the evaporating water is zero. However, it is needed to calculate the amount of water infiltrating.

The model is coded as a function, which will be of use later on when we want to calibrate the model and do sensitivity analysis.

In [None]:
def get_conc_cl(data, pan_factor, Cl_rain):
    """ get the chloride concentration over time
    
    Parameters
    ----------
    data : DataFrame
        DataFrame containing the columns area, volume, rain and evaporation
    pan_factor : float
        pan evaporation factor
    Cl_rain : float
        chloride concentration of the rain

    Returns
    -------
    df : pandas DataFrame
        dataframe with calculate concentration as a column    
    """
    Cl_0 = 20 # g/m^3 = mg/l

    df = data.copy() # Create a local copy, making sure the original DataFrame stays intact

    df['P'] = df['area'] * df['rain'] / 1000.
    df['E'] = df['area'] * df['evaporation'] / pan_factor / 1000.
    df['dV'] = -df['volume'].diff(periods=-1)
    df['I'] = df['P'] - df['E'] - df['dV']

    P = df["P"].to_numpy()
    I = df["I"].to_numpy()
    V = df["volume"].to_numpy()

    M_Cl_g = np.empty(len(df))
    conc_Cl = np.empty(len(df))

    for i, (Vi, Pi, Ii) in enumerate(zip(V, P, I)):
        if i == 0: # First day
            M_Cl_g[0] = Vi * Cl_0
            conc_Cl[0] = M_Cl_g[0] / Vi # Gives Cl_0 of course!
        else:
            M_Cl_g[i] = M_Cl_g[i - 1] + dM_P - dM_I
            conc_Cl[i] = M_Cl_g[i] / Vi

        dM_P = Cl_rain * Pi
        dM_I = conc_Cl[i] * Ii

    df["conc_Cl"] = conc_Cl

    return df

Let's read the data and execute the function

In [None]:
# read the data from excel using pandas
df = pd.read_excel(
    'data/water_balance_data.xlsx',
    index_col=0,
    parse_dates=True,
)

dfnew = get_conc_cl(data=df, Cl_rain=5, pan_factor=1.2)

Now we can plot the calculated chloride concentrations, as well as the measured ones. The result isn't too bad, except that there appears to be a diverging trend in time. This will be dealt with in the next notebook.

In [None]:
fig, ax = plt.subplots(figsize=(8,2))
ax.plot(dfnew["conc_Cl"])
ax.plot(dfnew["Cl_sample"], 'o');