# Using Pandas to process logger data (continued)

This notebook is another demonstration of how to use Pandas to read and process data from pressure transducers. 
1. Load the data
2. Check when the baro logger was submerged
3. Replace the faulty readings with those from a nearby logger
4. Subtract the atmospheric pressure to obtain the water column height above the logger
5. Determine the flow duration curve

First the required libraries must be imported.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from pathlib import Path

The `read_csv` command is similar to the one we used before, except that the decimal separator in this file is a comma and not a dot.

In [None]:
fpath = Path("data", "Douka Longo", "Douka_Longo_210818202752_X0709.CSV")
df_wl0 = pd.read_csv(
    fpath,
    skiprows=63,
    skipfooter=1,
    sep=';',
    decimal=',', # Decimal separator is required for this file
    index_col=0,
    parse_dates=True,
    encoding="ISO-8859-1",
    engine='python',
)
df_wl0

For consistency with the previous exercise, let's convert the pressures from centimeters to meters.

In [None]:
df_wl0["Pressure[mH2O]"] = df_wl0["Pressure[cmH2O]"] / 100
df_wl0

After the data were downloaded for the first time, the logger was restarted and set to record at 15 minute intervals. This means that we have to import this data and resammple it to 1 hour time interval to be consistent with the previous set of readings.

In [None]:
fpath = Path("data", "Douka Longo", "Douka_Longo_230707094904_X0709.CSV")
df_wl1 = pd.read_csv(
    fpath,
    skiprows=63,
    skipfooter=1,
    sep=';',
    decimal=',', # Decimal separator is required for this file
    index_col=0,
    parse_dates=True,
    encoding="ISO-8859-1",
    engine='python',
)
df_wl1

We resample by taking the mean.

In [None]:
df_wl1_h = df_wl1.resample('1H').mean()
df_wl1_h

The DataFrames can then be joined using the `concat` function.

In [None]:
df_wl_combi = pd.concat([df_wl0, df_wl1_h])
df_wl_combi["Pressure[mH2O]"].plot();

From the figure it is not immediately obvious that there is a gap in the time series. Another way to create a single DataFrame from the two imported time series is to first create a DataFrame with a DateRange as the index and a single column with only NaN values. The `fillna` method then conveniently replaces any NaN values using either the data from `df_wl0` or `df_wl1_h`. From the plot it then becomes obvious that there is a gap in the time series

In [None]:
date_index = pd.date_range(start=df_wl0.index[0], end=df_wl1.index[-1], freq='1H')
df_wl_combi_alt = pd.DataFrame(index=date_index, data={"Pressure[mH2O]": np.nan})
df_wl_combi_alt["Pressure[mH2O]"] = df_wl_combi_alt["Pressure[mH2O]"].fillna(df_wl0["Pressure[mH2O]"])
df_wl_combi_alt["Pressure[mH2O]"] = df_wl_combi_alt["Pressure[mH2O]"].fillna(df_wl1_h["Pressure[mH2O]"])
df_wl_combi_alt["Pressure[mH2O]"].plot()

Let's load the data from the logger that periodically gets submerged and process the data in the same way as before.

In [None]:
# Load the first file
fpath = Path("data", "Douka Longo", "Douka_Longo_Baro_210818202752_BX056.CSV")
df_p0 = pd.read_csv(
    fpath,
    skiprows=51,
    skipfooter=1,
    sep=';',
    decimal=',', # Decimal separator is required for this file
    index_col=0,
    parse_dates=True,
    encoding="ISO-8859-1",
    engine='python',
)

# and the second
fpath = Path("data", "Douka Longo", "Douka_Longo_Baro_230707104009_BX056.CSV")
df_p1 = pd.read_csv(
    fpath,
    skiprows=51,
    skipfooter=1,
    sep=';',
    decimal=',', # Decimal separator is required for this file
    index_col=0,
    parse_dates=True,
    encoding="ISO-8859-1",
    engine='python',
)

# Resample to one-hour intervals
df_p1_h = df_p1.resample('1H').mean()

# Create a single time series
date_index = pd.date_range(start=df_p0.index[0], end=df_p1.index[-1], freq='1H')
df_p_combi_do = pd.DataFrame(index=date_index, data={"Pressure[mH2O]": np.nan})
df_p_combi_do["Pressure[mH2O]"] = df_p_combi_do["Pressure[mH2O]"].fillna(df_p0["Pressure[cmH2O]"].divide(100))
df_p_combi_do["Pressure[mH2O]"] = df_p_combi_do["Pressure[mH2O]"].fillna(df_p1_h["Pressure[mH2O]"])
df_p_combi_do["Pressure[mH2O]"].plot();

There are yet two more file with the data from another baro logger located nearby, which was never flooded.

In [None]:
# Load the first file
fpath = Path("data", "Douka Longo", "Rabingha_Baro_210818202752_BX059.CSV")
df_p0_ra = pd.read_csv(
    fpath,
    skiprows=51,
    skipfooter=1,
    sep=';',
    decimal=',', # Decimal separator is required for this file
    index_col=0,
    parse_dates=True,
    encoding="ISO-8859-1",
    engine='python',
)

# and the second
fpath = Path("data", "Douka Longo", "Rabingha_Baro_230705125953_BX059.CSV ")
df_p1_ra = pd.read_csv(
    fpath,
    skiprows=51,
    skipfooter=1,
    sep=';',
    decimal=',', # Decimal separator is required for this file
    index_col=0,
    parse_dates=True,
    encoding="ISO-8859-1",
    engine='python',
)

# Resample to one-hour intervals
df_p1_h_ra = df_p1_ra.resample('1H').mean()
df_p1_h_ra

# Create a single time series
date_index = pd.date_range(start=df_p0.index[0], end=df_p1.index[-1], freq='1H')
df_p_combi_ra = pd.DataFrame(index=date_index, data={"Pressure[mH2O]": np.nan})
df_p_combi_ra["Pressure[mH2O]"] = df_p_combi_ra["Pressure[mH2O]"].fillna(df_p0_ra["Pressure[cmH2O]"].divide(100))
df_p_combi_ra["Pressure[mH2O]"] = df_p_combi_ra["Pressure[mH2O]"].fillna(df_p1_h_ra["Pressure[mH2O]"])
df_p_combi_ra["Pressure[mH2O]"].plot()

Let's plot the time series in a single plot, adding an offset to account for the systematic difference in readings.

In [None]:
fig, ax = plt.subplots()

dp = 0.121
ax.plot(df_p_combi_do["Pressure[mH2O]"])
ax.plot(df_p_combi_ra["Pressure[mH2O]"] + dp); 


The graph above makes it clear when the logger at Douka Longo was submerged. We can try to isolate these values and plot them.

In [None]:
fig, ax = plt.subplots()

idx = df_p_combi_do["Pressure[mH2O]"] > (df_p_combi_ra["Pressure[mH2O]"] + dp + 0.04)
ax.plot(df_p_combi_do.loc[idx, "Pressure[mH2O]"])
ax.plot(df_p_combi_do.loc[~idx, "Pressure[mH2O]"])
# ax.set_xlim(pd.to_datetime("2022-07-01"), pd.to_datetime("2022-10-01"))

Let's set the pressure readings to NaN during the times when the logger was submerged. Plotting the time series and zooming in on a three-month period shows the data gaps.

In [None]:
df_p_combi_do["Pressure[mH2O]_ra"] = df_p_combi_do["Pressure[mH2O]"]
df_p_combi_do.loc[idx, "Pressure[mH2O]_ra"] = np.nan

fig, ax = plt.subplots()
ax.plot(df_p_combi_do["Pressure[mH2O]_ra"])
ax.set_xlim(pd.to_datetime("2022-07-01"), pd.to_datetime("2022-10-01"))

Like before the `fillna` method can be used to fill the gaps using the readings from `df_p_combi_ra`. The plot shows the gaps have been filled.

In [None]:
df_p_combi_do["Pressure[mH2O]_ra"] = df_p_combi_do["Pressure[mH2O]_ra"].fillna(df_p_combi_ra["Pressure[mH2O]"].add(dp))

fig, ax = plt.subplots()
ax.plot(df_p_combi_do["Pressure[mH2O]_ra"])
ax.set_xlim(pd.to_datetime("2022-07-01"), pd.to_datetime("2022-10-01"));

The final step is to calculate the water pressures by subtracting the atmospheric pressures from the total pressures.

In [None]:
df_wl_combi_alt["patm"] = df_p_combi_do["Pressure[mH2O]_ra"]
df_wl_combi_alt["wl_corr"] = df_wl_combi_alt["Pressure[mH2O]"] - df_wl_combi_alt["patm"]

fig, ax = plt.subplots()
ax.plot(df_wl_combi_alt["wl_corr"]);

## Flow duration curves

In principle, these water level readings could be converted to a stream discharge using a rating curve (a relationship between the discharge and the water level). Because we don't have a rating curve, we can treat the water levels as a proxy for the discharge and use the numerical values to demonstrate the principle of creating a flow duration curve. First, let's resample the water levels to daily values (and simply taking the mean).

In [None]:
dfd = df_wl_combi_alt["wl_corr"].resample('1D').mean()

A flow duration curve displays the flow as a function of the exceedence, and the flows are plotted in the order of their size, from highest to lowest. Because `dfd` is a Pandas Series, we can use the `sort_values` method. Note that we have to get rid of some of the NaN remaining in the time series.

In [None]:
dfd_sorted = dfd.sort_values(ascending=False)
dfd_sorted = dfd_sorted.dropna()

Once the data are sorted, the exceendence is simply inferred from the number of daily water level readings and the plot can be made.

In [None]:
exceedence = np.arange(1.,len(dfd_sorted) + 1) / len(dfd_sorted) * 100 

fig, ax = plt.subplots()
ax.plot(exceedence, dfd_sorted)
