# Using Pandas to process logger data

This notebook demonstrates how to use Pandas to read and process data from pressure transducers. The tasks to be completed are
1. Load the data
2. Fill any gaps
3. Shift the measured pressures because the logger was placed in a different position after downloading
4. Subtract the atmospheric pressure to obtain the water column height above the logger
5. Determine the seasonal trend in water level

First the required libraries must be imported of course.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from pathlib import Path

The file to be loaded is the native CSV file format for this logger type. While this can be done using `read_csv`, it can be quite a puzzle to provide all the right keywords (and there is no general recipe, each logger manufacturer have their own format). The first 63 lines are metadata, so we provide `skiprows=63`, the last line is a text string that we do not wish to import, hence the `skipfooter=1`. We'd like to set the DataFrame index to a datetime format using the values in the first column, which is accomplished by `index_col=0` and `parse_dates=True`. Columns are not separated by a comma but by a semicolon, which is why we need `sep=';'`. Also note that we must provide the character encoding, which is typical for files of this type and which encoding format to choose is usually a matter of trial and error (StackOverflow is your friend in this case). Note that if you wanted to get rid of the warning you could also include `engine='python'`.

In [None]:
fpath = Path("data", "Rabingha", "Rabingha_Forage_230705130949_X1044.CSV")
df_wl = pd.read_csv(
    fpath,
    skiprows=63,
    skipfooter=1,
    index_col=0,
    parse_dates=True,
    sep=';',
    encoding="ISO-8859-1",
    # engine='python',
)
df_wl.head();

A plot gives a first impression of the data that were loaded. It is obvious when the data were downloaded and from the shift in the data points we can see that the logger was not placed back at the right depth in the well.

In [None]:
fig, ax = plt.subplots()
ax.plot(df_wl["Pressure[mH2O]"]);

To find out when the logger was not inside the well, we check when the pressure was less than 15 m of water.

In [None]:
idx0 = df_wl["Pressure[mH2O]"] < 15
df_wl.loc[idx0]

An approach to get rid of the outliers on these two dates is to set the corresponding values to NaN (not a number).

In [None]:
df_wl.loc[idx0, :] = np.nan

Before doing anything else, let's first fix the problem that the logger was shifted. To get a better picture we can plot the data two weeks before and after downloading.

In [None]:
tout = df_wl.loc[idx0].index[0]
idx1 = df_wl.index < tout

fig, ax = plt.subplots()
ax.plot(df_wl.loc[idx1, "Pressure[mH2O]"])
ax.plot(df_wl.loc[~idx1, "Pressure[mH2O]"], color='lightgray')

dt = pd.Timedelta('14D')
ax.set_xlim(tout - dt, tout + dt)

The shift is about one meter. It is remedied by adding this value to the pressure data after the logger was placed back in the well.

In [None]:
# Plot again
fig, ax = plt.subplots()
ax.plot(df_wl.loc[idx1, "Pressure[mH2O]"])
ax.plot(df_wl.loc[~idx1, "Pressure[mH2O]"], color='lightgray')

# Add the offset and display
dwl = 1.0
df_wl.loc[~idx1, "Pressure[mH2O]"] = df_wl.loc[~idx1, "Pressure[mH2O]"] + dwl
ax.plot(df_wl.loc[~idx1, "Pressure[mH2O]"])

ax.set_xlim(tout - dt, tout + dt)

There are still the two missing data values that need to be dealt with. The values on 23 september 2021 can be estimated using linear interpolation.

In [None]:
# Use interpolation to fill the gap at the first erroneous reading
print(df_wl.loc[idx0])
df_wl = df_wl.interpolate()
print(df_wl.loc[idx0])


Because it is the last reading, the value on 5 July 2023 can be deleted like this

In [None]:
df_wl = df_wl.iloc[:-1]

Now that the water pressures have been corrected, let's import the atmospheric pressure data

In [None]:
fpath = Path("data", "Rabingha", "Rabingha_Baro_230705125953_BX059.CSV")
df_p = pd.read_csv(
    fpath,
    skiprows=51,
    skipfooter=1,
    sep=';',
    index_col=0,
    parse_dates=True,
    encoding="ISO-8859-1",
    engine='python',
)
df_p.head()

Draw a graph to get a first impression

In [None]:
fig, ax = plt.subplots()
ax.plot(df_p["Pressure[mH2O]"]);

By adding the atmospheric pressures to `df_wl`, it becomes easy to subtract the atmospheric pressure from the total pressure (water + atmospheric) to get the water column height above the logger.

In [None]:

df_wl["patm"] = df_p["Pressure[mH2O]"]
df_wl["wl_corr"] = df_wl["Pressure[mH2O]"] - df_wl["patm"]

fig, ax = plt.subplots()
ax.plot(df_wl["wl_corr"])
# ax.set_xlim(tout - dt, tout + dt)


To separate the seasonal fluctuation from the daily pumping we can get the daily maximum and minimum water levels.

In [None]:
df_wl_d_min = df_wl["wl_corr"].resample('1D').min()
df_wl_d_max = df_wl["wl_corr"].resample('1D').max()

fig, ax = plt.subplots()
ax.plot(df_wl_d_min)
ax.plot(df_wl_d_max);

The difference between the maximum and the minimum is the daily range in water level. Plotting it may give a first idea about whether or not there is less pumping during the wet season.

In [None]:

fig, (ax0, ax1) = plt.subplots(nrows=2)

# Plot the maximum to get an idea of the seasonal trend
ax0.plot(df_wl_d_max)
ax0.set_ylabel("Daily max. water level (m)")

# Determine the daily range and plot
df_wl_diff = df_wl_d_max - df_wl_d_min
ax1.plot(df_wl_diff)
ax1.set_ylabel("Daily water level range (m)")

For more sophisticated times series analysis options, see the <A href="https://pastas.readthedocs.io/en/master/">Pastas</A> package.