The datasets consist of measurements on the meteorology (precipitation and temperature) and hydrology ((ground)water levels, flows and volumes). The 4 different type of waterbodies have different target variables:
- aquifers, groundwater level(s);
- river, water level;
- lake, water level and flow;
- water springs, flow.

## Aquifers
Aquifers can be confined (lt2) or unconfined (or free water table: cos, sal), for the other aquifers it is not clearly reported which measurements are confined or unconfined. Unconfined aquifers can be influenced by meteorology (precipitation and evaporation (temperature)), other waterbodies (such as rivers and lakes, which can drain or infiltrate into the aquifer, depending on the level in the aquifer and the other waterbody), extractions (irrigation, industrial water, drinking water etc.) and there is interaction between the unconfined and confined aquifers below. Confined aquifers are less influenced by meteorology. The reaction of groundwater is dampened and precipiation from past months or even years can still influence the groundwater levels today. The ground water levels of confined aquifers is even more dampened than those of confined aquifers, see also the plot below, but it is clear that it follows the same seasonality, low in summer (dry period), high in winter (wet(ter) period).

In [48]:
import pandas as pd
name = 'aquifer_auser'
df_aq = pd.read_csv(f'data/kaggle-original/{name}.csv', parse_dates=['Date'], dayfirst=True, index_col=['Date'])
df_aq.dropna(axis=0, how='all', inplace=True)
df_aq.columns = [column.lower() for column in df_aq.columns]


import plotly.express as px
import plotly.io as pio

pio.templates.default = "ggplot2"

px.line(df_aq[df_aq != 0], x=df_aq.index, y=['depth_to_groundwater_lt2', 'depth_to_groundwater_cos', 'depth_to_groundwater_sal'], 
        range_x=['2006-01-01', '2021-01-01'], title='target groundwater levels aquifer auser, confined (lt2) is more dampened than unconfined (cos, sal)', 
        labels=dict(value='depth to groundwater (m)', Date='date'))

In [49]:
from utils import histograms, corr_heatmap
histograms(df_aq, name)
corr_heatmap(df_aq, name=name)

## Rivers
Rivers react faster to (direct) meteorology (mostly precipitation) compared to aquifers. The flow in rivers can be determined by so called rainfall runoff models. The flow is related to precipiation, and has fast and slow components, for example precipitation falling on impervious areas such as roofs, slow components come from groundwater, the so called base flow. The base flow depends on the groundwater levels in relation to the river levels. The faster components are dominated by precipiation. The flow into a river can be seen as a series of buckets each having a certain volume, depending on the volume a certain drainage. The flow directly influences the water levels in the rivers, furthermore vegetation, control of weirs and other structures (which can change in time) influence the water levels.

In [50]:
name = 'river_arno'
df_r = pd.read_csv(f'data/kaggle-original/{name}.csv', parse_dates=['Date'], dayfirst=True, index_col=['Date'])
df_r.dropna(axis=0, how='all', inplace=True)
df_r.columns = [column.lower() for column in df_r.columns]

px.line(df_r[df_r != 0], x=df_r.index, y=['hydrometry_nave_di_rosano'], title='target hydrometry river arno @ nave di rosano',
        labels=dict(value='waterlevel (m)', Date='date'))

In [57]:
histograms(df_r, name)

corr_heatmap(df_r, name=name, size=700)

## Lake
Lake are big buckets, mostly in direct contact with groundwater, so the level is related to the groundwater, and influenced by precipitation and evaporation. In this case there is also controlled extraction from the lake.

In [52]:
name = 'lake_bilancino'
df_l = pd.read_csv(f'data/kaggle-original/{name}.csv', parse_dates=['Date'], dayfirst=True, index_col=['Date'])
df_l.dropna(axis=0, how='all', inplace=True)
df_l.columns = [column.lower() for column in df_l.columns]

import plotly.graph_objects as go
from plotly.subplots import make_subplots


fig = make_subplots(specs=[[{"secondary_y": True}]])


fig.add_trace(go.Scatter(x=df_l[df_l != 0].index, y=df_l[df_l != 0]['lake_level'], name="lake_level"))
fig.add_trace(go.Scatter(x=df_l[df_l != 0].index, y=df_l[df_l != 0]['flow_rate'], name="flow_rate"), secondary_y=True)


fig.update_layout(title_text="target variables lake bilancino, level & flow rate", yaxis_range=[220,260], 
                  yaxis2_range=[0,100], xaxis_title='date', yaxis_title='waterlevel (m)', yaxis2_title='flow rate')
fig.show()

In [59]:
histograms(df_l, name, height=500)

corr_heatmap(df_l, name=name, size=500)

## Water spring
Water springs are point at which water flows from the earth's surface. The flow largely depends on the water levels of its source, the water level of it's source in turn can be influenced by meteorology or other hydrological parameters (see for example above ;)).

In [66]:
name = 'water_spring_amiata'
df_s = pd.read_csv(f'data/kaggle-original/{name}.csv', parse_dates=['Date'], dayfirst=True, index_col=['Date'])
df_s.dropna(axis=0, how='all', inplace=True)
df_s.columns = [column.lower() for column in df_s.columns]

px.line(df_s[df_s != 0], x=df_s.index, y=['flow_rate_bugnano', 'flow_rate_arbure', 'flow_rate_ermicciolo', 'flow_rate_galleria_alta'],
        title='target flow rates spring amiata @ bugnano, arbure, ermicciolo, galleria alta',
        labels=dict(value='flow rate', Date='date'), range_x=['2015-01-01', '2021-01-01'])

In [60]:
histograms(df_s, name, height=900)

corr_heatmap(df_s, name=name, size=700)