
# Example data analysis from participants

*Name, Surname, 26th of September, 2024*

In this notebook the stream Gauge data is analysed, with the main goal of identifying the ten highest flow for the period with records.

Files: `Gauge 1.csv`, `Gauge 2.csv`

Tasks:

- Clean up the data, and select data using a quality code.
- Extract daily and annual data from the 6-minute data. Daily for following analysis and annual for Flood Frequency Analysis.
- Identify the top 10 high flow values for each gauge in the daily data, including corresponding dates and levels.
- For the dates of the top 10 high flows in Gauge 1, find the corresponding flow and level in the second dataset 
- Tabulate these results for the report.

### 0. Load python packages

In [3]:
import os
import pandas as pd

### 1. Load data

In [4]:
folder = "data"

df = pd.read_csv(
    os.path.join(folder, "Gauge 1.csv"),
    skiprows=127,
    index_col=0,
    parse_dates=True,
    dayfirst=True,
    usecols=[0, 2, 3],
    names=["Date", "Quality", "Discharge"],
)

### 2. Select data using one of the quality codes

In [None]:
df_selected = df[df.loc[:, "Quality"] == 1]
df_selected


### 3. Resample to daily and annual discharge

In [6]:
q_daily = df_selected.loc[:, "Discharge"].resample("D").sum()
q_annual = df_selected.loc[:, "Discharge"].resample("D").sum().resample("YE").max()

### 4. Plot the discharge

In [None]:
q_daily.plot()

### 5. Select top 10 high flows and dates

In [None]:
top10 = q_daily.sort_values(ascending=False).head(10)
top10

### 6. For the dates of the top 10 high flows in Gauge 1, find the corresponding flow and level in the second dataset 

#### 6a. Load the data from gauge 2

In [None]:
gauge2 = pd.read_csv(
    os.path.join(folder, "Gauge 2.csv"),
    skiprows=160,
    index_col=0,
    parse_dates=True,
    dayfirst=True,
    usecols=[0, 2, 3],
    names=["Date", "Quality", "Discharge"],
)

gauge2.head()

### Exercise 1: write a one-line script to select and resample

Write a one-line script to only select the data with a quality label "1" and resample the selected data to daily sums.

In [None]:
# Select and resample
q2_daily = # Write your code here
q2_daily


#### 6b. Plot the data

To see that there is little overlap in the data

In [None]:
q_daily.plot()
q2_daily.plot()


#### 6c. Find values of gauge2 during the gauge1 top 10 flood events

In [None]:
q2_daily.loc[top10.index.round("D")]  # Select the top 10 days from gauge 2

### Select only for the dates where there is data

In [None]:
good_idx = [idx for idx in top10.index if idx in q2_daily.index]

top10_q2 = q2_daily.loc[good_idx]
top10_q2

### Export data to a table

In [None]:
# Create a new DataFrame to report the top 10 events

top10_df = pd.DataFrame({
    "Gauge 1": top10,
    "Gauge 2": top10_q2,
})


# Export to excel
top10_df.to_excel("top10.xlsx")

top10_df


### Let's try some GitHub co-pilot

In [70]:
# Perform a flood frequency analysis using the Gumbel distribution on the anual maxima


