# Minor Flood Frequency
```{glue:figure} threshold_counts_days_fig
:scale: 50%
:align: right
```

In this notebook we will plot two indicators concerning flooding at the Hawaii tide gauges, after first taking a general look at the type of data we are able to plot. These indicators are based on a 'flooding' threshold, using [relative sea level](https://tidesandcurrents.noaa.gov/sltrends/faq.html). 

Download Files:
[Map](https://uhslc.soest.hawaii.edu/jfiedler/SeaLevelIndicators/output/Hawaii_Region_Output/SL_FloodFrequency_map.png) |
[Time Series Plot](https://uhslc.soest.hawaii.edu/jfiedler/SeaLevelIndicators/output/Hawaii_Region_Output/SL_FloodFrequency_threshold_counts_days.png) |
[Table](https://uhslc.soest.hawaii.edu/jfiedler/SeaLevelIndicators/output/Hawaii_Region_Output/flood_frequency_table.png)



## Setup

We first need to import the necessary libraries, establish our input/output directories, and set up some basic plotting rules. As with our other notebooks, we'll do this by running another notebook called "setup."

In [None]:
%run setup.ipynb


 ##  Retrieve the Tide Station(s) Data Set(s)

We stored this previously in our data directory as "rsl_hawaii.nc"

In [None]:
# load the data
rsl = xr.open_dataset(data_dir / 'rsl_hawaii_noaa.nc')

and we'll save a few variables that will come up later for report generation.

### Set the Datum to MHHW

```{margin} A Note on Datums

The sea level variable in the netcdf file is sea level **relative to the station datum**. 

```

In [None]:
# convert sea level to MHHW
rsl['sea_level_MHHW'] = rsl['sea_level'] - rsl['MHHW']
rsl['sea_level_MHHW'].attrs['units'] = 'm'
rsl['sea_level_MHHW'].attrs['long_name'] = 'Sea Level, relative to MHHW'

###  Assess Station Data Quality for the POR (1983-2024)

To do this, we'll plot all the sea level data to make sure our data looks correct, and then we'll truncate the data set to the time period of record (POR).

```{margin} Watch the units!
```{caution} Note that the sea_level variable here is in meters (m)! If we want to plot things on a centimeter (cm) scale, we have to multiply by 100.
```

In [None]:
fig, ax = plt.subplots(sharex=True, figsize=(10, 10))
fig.autofmt_xdate()

rsl = rsl.sortby('lat')
# Initial offset
offset = 0
# The amount to offset each successive line
offset_increment = 150  # Adjust this value based on your data scale and visual preference

station_ids = rsl['station_id'].values
station_names = rsl['station_name'].values

for i, (station_id, station_name) in enumerate(zip(station_ids, station_names)):
    sea_level_data = 100 * rsl.sea_level_MHHW.sel(station_id=station_id).values  # cm
    ax.plot(rsl.time.values, sea_level_data + offset, label=station_name)
    ax.axhline(offset, color='black', linewidth=0.5, linestyle=':')
    ax.annotate(
        station_name,
        xy=(rsl.time.values[0], offset-20),
        xytext=(5, 0),
        textcoords='offset points',
        color='black',
        fontsize=10,
        ha='left',
        va='top'
    )
    offset += offset_increment

ax.set_ylabel(rsl['sea_level_MHHW'].long_name + ' (cm, offset by 150cm per station)')




#### Identify timespan for the flood frequency analysis

Now, we'll calculate trend starting from the beginning of the tidal datum analysis period epoch to the last time processed. The  epoch information is given in the datums table. 

In [None]:

# make POR_start equal to Jan 1 1983 in datetime format
POR_start = dt.datetime(1983, 1, 1)

# and for now, end time will be the end of 2024
POR_end = dt.datetime(2024, 12, 31)


hourly_data = rsl.sel(dict(time=slice(POR_start, POR_end)))
hourly_data = hourly_data.sortby('lat')

glue("startPORDateTime",POR_start.strftime('%Y-%m-%d'), display=False)
glue("endPORDateTime",POR_end.strftime('%Y-%m-%d'), display=False)



and plot the hourly time series

In [None]:
hourly_data['sea_level_MHHW'] = hourly_data['sea_level_MHHW'] * 100

hourly_data['sea_level_MHHW'].attrs['units'] = 'cm'
hourly_data['sea_level_MHHW'].attrs['long_name'] = 'Sea Level, relative to MHHW'

hourly_data['sea_level_MHHW'].plot.line(x='time',label=hourly_data.station_name.values)

#put the legend outside the plot
plt.legend(loc='center left', bbox_to_anchor=(1, 0.5))

# set xlimits
plt.xlim([POR_start, POR_end])

# set title 
titlestr = f'Tide Gauges ({POR_start.strftime("%Y")}-{POR_end.strftime("%Y")})'
plt.title(titlestr)

fig  = plt.gcf()

glue("TS_full_fig",fig,display=False)


```{glue:figure} TS_full_fig
:name: "fig-TS_full"

Full time series at the {glue:}`SL_Data_Wrangling.ipynb::station_group` tide gauges for the entire record from {glue:text}`startPORDateTime` to {glue:text}`endPORDateTime`. Note that the sea level is plotted in units of cm, relative to MHHW.
```

### Adjust the data from calendar year to storm year

Storm year goes from May-April. We'll keep this in our back pocket if we need to change our analysis to storm years instead of calendar years.

In [None]:
#IGNORING FOR NOW, DOES HAWAII DO STORM YEAR?

hourly_data['day'] = (('time'), hourly_data.time.dt.dayofyear.data)
hourly_data['month'] = (('time'), hourly_data.time.dt.month.data)    
hourly_data['year'] = (('time'), hourly_data.time.dt.year.data)

# adjust year to storm year, where the storm year starts on May 1st
# if the month is less than 5, subtract a year
hourly_data['year_storm'] = (('time'), hourly_data.year.data - (hourly_data.month.data < 5))

hourly_data['year_storm'] = hourly_data['year_storm'].astype(int)


Save the data to the data directory:

In [None]:
# Assuming year_storm is created from the 'time' column
hourly_data['year_storm'] = hourly_data['time'].dt.year
hourly_data['year_storm'] = hourly_data['year_storm'].astype(int)

# save the data
hourly_data.to_netcdf(data_dir / 'SL_hourly_data.nc')

## Calculate and Plot Flood Frequency
To analyze flood frequency, we will look for daily maximum sea levels for each day in our dataset, following {cite:t}`thompson_statistical_2019` and others. Then, we can group our data by year and month to visualize temporal patterns in daily SWL exceedance.   

```{glue:figure} histogram_fig
:name: "fig-histogram"
:figclass: margin

Histogram of daily maximum water levels at the {glue:}`SL_Data_Wrangling.ipynb::station_group` tide gauges for the entire record from {glue:text}`startPORDateTime` to {glue:text}`endPORDateTime`, relative to {glue:}`SL_Data_Wrangling.ipynb::datumname`. The dashed red line indicates the chosen NOS threshold of {glue:text}`threshold_nos` cm.


In [None]:
# Resample the hourly data to daily maximum sea level
SL_daily_max = hourly_data.resample(time='D').max()

# remove time dimension from every variable except sea_level_MHHW
SL_daily_max
timevars = ['sea_level_MHHW','sea_level','flood_day','flood_hour','day','month','year','year_storm']

#remove time from vars that aren't timevars
for var in SL_daily_max.data_vars:
    if var not in timevars:
        SL_daily_max[var] = SL_daily_max[var].isel(time=0)

In [None]:
# make a new figure that is 15 x 5
fig, ax = plt.subplots(sharex=True)
plt.plot(SL_daily_max.time.values, SL_daily_max.sea_level_MHHW.values,label=SL_daily_max.station_name.values)
plt.xlabel('Date (Calendar Year)')
plt.ylabel('Sea Level (cm)')
plt.title('Sea Level Daily Maximum Time Series')

# add legend outside plot
plt.legend(loc='upper left', bbox_to_anchor=(1, 1))


```{margin} What's in a threshold?
Different flood thresholds will produce different results, and different data sources have different thresholds. While certain locations may have unique characteristics that give rise to lower or higher thresholds, in practice using a different threshold on a regional or national scale can get a little finicky. The national weather service (NWS) impact thresholds are localized to the gauges themselves based on historical observations and used to issue coastal flood advisories for local areas. The national ocean service (NOS) minor flood thresholds more broadly applied, and correspond to  roughly 0.5m above the local diurnal tide range on the US mainland. For Hawaii (and Midway) they are set at ~30cm above MHHW.

Exact formula given in {cite:t}`sweet_patterns_2018` is  1.04*[Local GT tidal datum] + 0.50 m
```

## Define flood thresholds


The choice of water level exceendance threshold at a tide gauge (aka, the level at which minor flooding occurs once exceeded) can significant affect any calculated statistics of coastal flooding. Quite often, a still water level threshold may not directly correspond to what is seen in terms of impacts, such as flooded coastal streets or backed up storm drains. This is because still water levels only really tell you _some_ of the story. In reality, compounding effects like rain, upstream flooding, and waves, will change what qualifies as a flooded. Lacking calibration data of in situ flooding, however, these still water levels serve as a useful proxy. Think of it as "setting the stage." In the following analysis we'll explore how to calculate 'flood days' and 'flood hours' based on the still water level recorded at a tide gauge. 

### Percentile-Based Threshold
One technique of defining a flood threshold is through the use of percentiles. 
Here, we'll define it as the 95th percentile of the daily max water levels. Change at will!

In [None]:
# CHANGE THIS TO PLAY WITH PERCENTILE-BASED THREHOLDS
percentile = 95
thresholds = np.nanpercentile(SL_daily_max['sea_level_MHHW'], percentile, axis=0)


Now we'll plot it up.

In [None]:
fig, axs = plt.subplots(len(SL_daily_max['station_name']), 1, sharex=True, figsize=(4, 4))

for i, ax in enumerate(axs):
    ax.hist(hourly_data['sea_level_MHHW'][i,:], bins=100, density=True, label='Sea Level Data')
    ax.axvline(thresholds[i], color='r', linestyle='dashed', linewidth=2, label=f'{percentile}th Percentile')

    ax.text(0.05, 0.8, hourly_data['station_name'][i].values, transform=ax.transAxes, fontsize=8, verticalalignment='top')
    ax.text(0.95,0.5, f'{thresholds[i]:.1f}', color='r', fontsize=8, transform=ax.transAxes, horizontalalignment='right')

    # remove y-axis label for all
    ax.set_yticklabels('')

axs[-1].set_xlabel('Sea Level (cm)')
plt.subplots_adjust(hspace=0)

axs[0].set_title(f'Sea level histogram with \n{percentile}th Percentile Thresholds')

# set xlim to -150,75
plt.xlim(-100,50)


#make threshold for (XX))th percentile on each gauge
thresholds = np.nanpercentile(SL_daily_max['sea_level_MHHW'], percentile, axis=0)
glue("threshold_percentile",percentile,display=False)


### Impact-based thresholds
Now we'll set the threshold to align with NOAA CO-OPS's API, which has both the NOS and NWS thresholds:


In [None]:
url = 'https://api.tidesandcurrents.noaa.gov/mdapi/prod/webapi/'

threshold_nws = {}
threshold_nos = {}
for station in station_ids:
    thresholds_url = url + f'stations/{station}/floodlevels.json?units=metric'
    thresholdsNOAA = requests.get(thresholds_url).json()
    try:
        nws_minor = thresholdsNOAA['nws_minor']
        nos_minor = thresholdsNOAA['nos_minor']
        mhhw = rsl['MHHW'].sel(station_id=station).item()
        if nws_minor is not None:
            threshold_nws[station] = nws_minor - mhhw
        if nos_minor is not None:
            threshold_nos[station] = nos_minor - mhhw
    except KeyError:
        threshold_nws[station] = None  # or np.nan
        threshold_nos[station] = None  # or np.nan

# multiply by 100 to convert to cm
threshold_nws = {k: v * 100 for k, v in threshold_nws.items() if v is not None}
threshold_nos = {k: v * 100 for k, v in threshold_nos.items() if v is not None}

Note that for the Hawaiian Island region, the NOS minor threshold is 30.4 cm for all stations, whereas the NWS minor threshold varies. There is no NWS threshold for Midway.

In [None]:
# Make a pdf of the data with NOS minor threshold
fig, axs = plt.subplots(len(hourly_data['station_name']), 1, sharex=True)

for i, ax in enumerate(axs):
    ax.hist(hourly_data['sea_level_MHHW'][i,:], bins=100, density=True, label='Sea Level Data')
    ax.axvline(threshold_nos[station_ids[i]], color='r', linestyle='--', label='Threshold: {:.4f} cm'.format(threshold_nos[station_ids[i]]))

    ax.text(0.05, 0.8, hourly_data['station_name'][i].values, transform=ax.transAxes, fontsize=8, verticalalignment='top')

    # remove y-axis label for all
    ax.set_yticklabels('')

axs[-1].set_xlabel('Sea Level (cm)')
plt.subplots_adjust(hspace=0)


ax.set_xlabel('Sea Level (cm)')
fig.text(0.04, 0.5, 'Probability Density', va='center', rotation='vertical')
# make the title two lines
axs[0].set_title('Sea Level Histogram\nwith Defined Threshold')


# add label to dashed line
# get value of middle of y-axis for label placement
ymin, ymax = ax.get_ylim()
yrange = ymax - ymin
y_middle = ymin + yrange/2

axs[0].text(threshold_nos[station_ids[0]]+10, y_middle, '{:.1f} cm'.format(threshold_nos[station_ids[0]]), rotation=0, va='center', ha='left', color='r')
glue("histogram_fig", fig, display=False)

## Define 'flood days' and 'flood hours'

These are the days and hours in which the water levels surpass the given flood threshold. For the following analysis we'll stick with the 95th percentile values.

In [None]:
flood_day = (SL_daily_max.sea_level_MHHW.values > thresholds)

flood_hour = (hourly_data.sea_level_MHHW.values > thresholds[:,None]) # need to account for (stations,time) dimensions
flood_hour

# flip the array so that the first dimension is time
flood_hour = np.transpose(flood_hour)

In [None]:
# Add the 'flood_day' variable to the daily_max dataset
SL_daily_max['flood_day'] = (('time', 'station_id'), flood_day.data)


# Assign metadata variables from hourly_data to SL_daily_max
meta_vars = ['lat', 'lon', 'station_name', 'station_country', 'MSL', 'MHHW']
for var in meta_vars:
    SL_daily_max[var] = hourly_data[var]

# put the threshold into the daily_max dataset
SL_daily_max['threshold'] = (('station_id'), thresholds.data)
SL_daily_max['threshold'].attrs['units'] = 'cm'
SL_daily_max['threshold'].attrs['long_name'] = f'{percentile}th Percentile Threshold'

SL_daily_max['flood_day'].attrs['units'] = 'days'
SL_daily_max['flood_day'].attrs['long_name'] = f'Flood Day above {percentile}th Percentile'

# save SL_daily_max to netcdf
SL_daily_max.to_netcdf(data_dir / 'SL_daily_max.nc')

In [None]:
# Filtering for flood days again now that 'flood_day' has been correctly added
flood_days_data = SL_daily_max.where(SL_daily_max.flood_day, drop=True)

# Initialize an empty DataFrame again for the loop
flood_days_per_year = pd.DataFrame()

for station_id in SL_daily_max.station_id.values:
    # Extracting flood days for each station_id
    flood_days_df = flood_days_data.sel(station_id=station_id).dropna(dim='time', how='all').to_dataframe().reset_index()

    # Extract year from the 'time' column and count flood days
    flood_days_df['year'] = flood_days_df['time'].dt.year
    flood_days_count = flood_days_df.groupby('year').size().reset_index(name=station_id)

    # Merge this count with the main DataFrame
    if flood_days_per_year.empty:
        flood_days_per_year = flood_days_count.set_index('year')
    else:
        flood_days_per_year = flood_days_per_year.join(flood_days_count.set_index('year'), how='outer')

# Replace missing values with 0
flood_days_per_year.fillna(0, inplace=True)
flood_days_per_year = xr.DataArray(flood_days_per_year.values, dims=('year', 'station_id'), coords={'year': flood_days_per_year.index, 'station_id': flood_days_per_year.columns})

In [None]:
# Add the 'flood_day' variable to the dataset
hourly_data['flood_hour'] = (('time', 'station_id'), flood_hour.data)
# Filtering for flood days again now that 'flood_hour' has been correctly added
flood_hours_data = hourly_data.where(hourly_data.flood_hour, drop=True)
# Initialize an empty DataFrame again for the loop
flood_hours_per_year = pd.DataFrame()
for station_id in hourly_data.station_id.values:
    # Extracting flood days for each station_id
    flood_hours_df = flood_hours_data.sel(station_id=station_id).dropna(dim='time', how='all').to_dataframe().reset_index()

    # Extract year from the 'time' column and count flood days
    flood_hours_df['year'] = flood_hours_df['time'].dt.year
    flood_hours_count = flood_hours_df.groupby('year').size().reset_index(name=station_id)

    # Merge this count with the main DataFrame
    if flood_hours_per_year.empty:
        flood_hours_per_year = flood_hours_count.set_index('year')
    else:
        flood_hours_per_year = flood_hours_per_year.join(flood_hours_count.set_index('year'), how='outer')
# Replace missing values with 0
flood_hours_per_year.fillna(0, inplace=True)
flood_hours_per_year = xr.DataArray(flood_hours_per_year.values, dims=('year', 'station_id'), coords={'year': flood_hours_per_year.index, 'station_id': flood_hours_per_year.columns})


Make a new xarray dataset with attributes from SL_daily_max

In [None]:
#make new xarray dataset with attributes from SL_daily_max
ds = xr.Dataset()

ds['flood_hours_per_year'] = (('year', 'station_id'), flood_hours_per_year.values)

ds['flood_days_per_year'] = (('year', 'station_id'), flood_days_per_year.values)

ds['flood_days_per_year'].attrs = {'long_name': 'Number of flood days per year', 'units': 'days'}
ds['flood_hours_per_year'].attrs = {'long_name': 'Number of flood hours per year', 'units': 'hours'}

# set year and station_id as coordinates
ds['year'] = flood_hours_per_year.year
ds['station_id'] = flood_hours_per_year.station_id

ds['year'].attrs = {'long_name': 'Year', 'units': 'calendar year'}
ds['station_id'].attrs = {'long_name': 'Station ID', 'units': '1'}


ds['lat'] = SL_daily_max['lat']
ds['lat'].attrs = {'long_name': 'Latitude', 'units': 'degrees_north'}

ds['lon'] = SL_daily_max['lon']
ds['lon'].attrs = {'long_name': 'Longitude', 'units': 'degrees_east'}

ds['station_name'] = SL_daily_max['station_name']
ds['station_name'].attrs = {'long_name': 'Station Name', 'units': '1'}

ds['station_country'] = SL_daily_max['station_country']
ds['station_country'].attrs = {'long_name': 'Station Country', 'units': '1'}


# Take a peek at the data:
ds



In [None]:
#find minimum value in flood_days_per_year
min_flood_days = flood_days_per_year.min().min()
max_flood_days = flood_days_per_year.max().max()


### Plot Flood Frequency Counts 

The flood frequency counts are defined as the number of time periods that exceed a given threshold within a year. This plot follows {cite:t}`center_for_operational_oceanographic_products_and_services_us_sea_2014`.

In [None]:
# Adjusting the heatmap palette to improve readability
adjusted_heatmap_palette = sns.color_palette("YlOrRd", as_cmap=True)

station_id = '1617760'
df = ds['flood_days_per_year'].sel(station_id=station_id).to_dataframe().reset_index()

norm = plt.Normalize(df['flood_days_per_year'].min(), df['flood_days_per_year'].max())
colors = [adjusted_heatmap_palette(norm(value)) for value in df['flood_days_per_year']]

threshold = round(thresholds[station_id==ds['station_id'].values].item(), 2)
glue("threshold",threshold,display=False)

# Plotting with the adjusted settings
fig, ax = plt.subplots()


ax = sns.barplot(
    x='year', 
    y='flood_days_per_year', 
    hue='year', 
    data=df,
    palette=colors,
    dodge=False,
    legend=False
)
ax.set_xticks(range(0, len(df), 5))  # Setting x-ticks to show every 5th year
year_ticks = df['year'][::5].astype(int)  # Selecting every 5th year for the x-axis
ax.set_xticklabels(year_ticks, rotation=45)

# Adding a light gray grid
station_name = ds['station_name'].sel(station_id=station_id).values.item()

ax.text(0.05, 0.9, station_name + '\nabove ' + str(threshold) + ' cm threshold (' + str(percentile) + 'th percentile)', ha='left', va='center', transform=ax.transAxes)

#save the figure
figname = 'SL_FloodFrequency_threshold_counts_DAYS_'+station_name+'.png'
fig.savefig(output_dir / figname, bbox_inches='tight')

In [None]:
# Adjusting the heatmap palette to improve readability
adjusted_heatmap_palette = sns.color_palette("YlOrRd", as_cmap=True)

df = ds['flood_hours_per_year'].sel(station_id=station_id).to_dataframe().reset_index()

norm = plt.Normalize(df['flood_hours_per_year'].min(), df['flood_hours_per_year'].max())
colors = [adjusted_heatmap_palette(norm(value)) for value in df['flood_hours_per_year']]


# Plotting with the adjusted settings
fig, ax = plt.subplots()


ax = sns.barplot(
    x='year', 
    y='flood_hours_per_year', 
    hue='year', 
    data=df,
    palette=colors,
    dodge=False,
    legend=False
)
ax.set_xticks(range(0, len(df), 5))  # Setting x-ticks to show every 5th year
year_ticks = df['year'][::5].astype(int)  # Selecting every 5th year for the x-axis
ax.set_xticklabels(year_ticks, rotation=45)

# Adding a light gray grid
station_name = ds['station_name'].sel(station_id=station_id).values.item()

ax.text(0.05, 0.9, station_name + '\nabove ' + str(threshold) + ' cm threshold (' + str(percentile) + 'th percentile)', ha='left', va='center', transform=ax.transAxes)

#save the figure
figname = 'SL_FloodFrequency_threshold_counts_HOURS_'+station_name+'.png'
fig.savefig(output_dir / figname, bbox_inches='tight')

In [None]:
# Adjusting the heatmap palette to improve readability
adjusted_heatmap_palette = sns.color_palette("YlOrRd", as_cmap=True)
norm = plt.Normalize(flood_days_per_year.min().min(), flood_days_per_year.max().max())
colors = [adjusted_heatmap_palette(norm(value)) for value in flood_days_per_year.values]

# do a pcolormesh plot
fig, ax = plt.subplots()
ax.pcolormesh(ds['year'], ds['station_name'], ds['flood_days_per_year'].T, cmap=adjusted_heatmap_palette, norm=norm)

# add a colorbar
cax = fig.add_axes([0.92, 0.15, 0.02, 0.7])  # adjust the position and size of the colorbar
sm = plt.cm.ScalarMappable(cmap=adjusted_heatmap_palette, norm=norm)
plt.colorbar(sm, cax=cax,label= 'Number of Flood Days')

ax.set_xlabel('Year')
ax.set_title('Number of Flood Days Per Year')             

glue("threshold_counts_days_fig", fig, display=False)

# save the figure
fig.savefig(output_dir / 'SL_FloodFrequency_threshold_counts_days.png', bbox_inches='tight')

```{glue:figure} threshold_counts_days_fig
:name: "fig-threshold_counts"

Flood frequency counts above the {glue:text}`threshold_percentile:.0f`th percentile threshold per year at {glue:}`SL_Data_Wrangling.ipynb::station_group` tide gauges from {glue:text}`startPORDateTime` to {glue:text}`endPORDateTime`. 
```

### Plot Flood Duration

This next plot examines the average duration of flooding events as defined by the threshold. 
I have a few issues with this plot being "duration," as it's just counts of hours above the threshold. These hours need not be continuous...which to me is what duration is all about. Anyway, we carry on.

In [None]:
# Adjusting the heatmap palette to improve readability
adjusted_heatmap_palette = sns.color_palette("YlOrRd", as_cmap=True)
norm = plt.Normalize(flood_hours_per_year.min().min(), flood_hours_per_year.max().max())
colors = [adjusted_heatmap_palette(norm(value)) for value in flood_hours_per_year.values]

# do a pcolormesh plot
fig, ax = plt.subplots()
ax.pcolormesh(ds['year'], ds['station_name'], ds['flood_hours_per_year'].T, cmap=adjusted_heatmap_palette, norm=norm)

# add a colorbar
cax = fig.add_axes([0.92, 0.15, 0.02, 0.7])  # adjust the position and size of the colorbar
sm = plt.cm.ScalarMappable(cmap=adjusted_heatmap_palette, norm=norm)
plt.colorbar(sm, cax=cax,label= 'Number of Flood Hours')

ax.set_xlabel('Year')
ax.set_title('Number of Flood Hours Per Year')             
# ax.set_yticklabels(SL_daily_max['station_name'].values[0])

# save the figure
fig.savefig(output_dir / 'SL_FloodFrequency_threshold_counts_hours.png', bbox_inches='tight')



glue("duration_fig", fig, display=False)

```{glue:figure} duration_fig
:name: "fig-duration"

Average flood duration in hours above the {glue:text}`threshold_percentile:.0f`th percentile flood threshold per year for {glue:}`SL_Data_Wrangling.ipynb::station_group` region tide gauges from {glue:text}`startPORDateTime` to {glue:text}`endPORDateTime`. 
```

## Calculate the change over time

Next we'll calculate the change in flood days and hours over the POR at the tide station/s, for both Frequency and Duration.

The next code cell fits a trend line to the flood days per year data and calculates the trend line.
The slope of the trend line is the change in flood days per year. The same process is repeated for flood hours per year.


In [None]:

def calculate_flood_trend(ds, timescale = 'days'):

    if timescale == 'days':
        dsvar = 'flood_days_per_year'
    elif timescale == 'hours':
        dsvar = 'flood_hours_per_year'

    slopes = []
    intercepts = []
    rate_changes = []
    
    #trends is an empty array the length of ds['year'] and the number of records
    trends = np.empty((len(ds['year']), len(ds['station_id'])))

    for station_id in range(len(ds['station_id'])):
        slope, intercept, _, _, _ = stats.linregress(ds['year'].values, ds[dsvar].isel(station_id=station_id).values)
        trend = intercept + slope * ds['year']
        # Use slope as the indicator of change (units: days/year or hours/year)
        rate_change = slope
        slopes.append(slope)
        intercepts.append(intercept)
        rate_changes.append(rate_change)
        trends[:, station_id] = trend

    return slopes, intercepts, rate_changes, trends

slopes, intercepts, rate_changes, trends = calculate_flood_trend(ds, 'days')

#add to dataset
ds['slope_days'] = (('station_id'), slopes)
ds['intercept_days'] = (('station_id'), intercepts)
ds['rate_change_days'] = (('station_id'), np.squeeze(rate_changes))
ds['trend_days'] = (('year', 'station_id'), trends)

ds['slope_days'].attrs = {'long_name': 'Slope of the trend line', 'units': 'days/year'}
ds['intercept_days'].attrs = {'long_name': 'Intercept of the trend line', 'units': 'days'}
ds['rate_change_days'].attrs = {'long_name': 'Rate of change in flood days per year', 'units': 'days/year'}
ds['trend_days'].attrs = {'long_name': 'Trend line of flood days per year', 'units': 'days'}


slopes, intercepts, rate_changes, trends = calculate_flood_trend(ds, 'hours')

#add to dataset
ds['slope_hours'] = (('station_id'), slopes)
ds['intercept_hours'] = (('station_id'), intercepts)
ds['rate_change_hours'] = (('station_id'), np.squeeze(rate_changes))
ds['trend_hours'] = (('year', 'station_id'), trends)

ds['slope_hours'].attrs = {'long_name': 'Slope of the trend line', 'units': 'hours/year'}
ds['intercept_hours'].attrs = {'long_name': 'Intercept of the trend line', 'units': 'hours'}
ds['rate_change_hours'].attrs = {'long_name': 'Rate of change in flood hours per year', 'units': 'hours/year'}
ds['trend_hours'].attrs = {'long_name': 'Trend line of flood hours per year', 'units': 'hours'}


In [None]:
summary_stats_all = []

for station_idx in range(len(ds['station_id'])):
    station_name = ds['station_name'][station_idx].item()
    station_id = ds['station_id'][station_idx].item()
    stats = {
        'Station Name': station_name,
        'Station ID': station_id,
        'Threshold (cm)': round(thresholds[station_idx],1),
        'Total Flood Days': int(ds['flood_days_per_year'][:, station_idx].sum()),
        'Average Flood Days per Year': round(float(ds['flood_days_per_year'][:, station_idx].mean()), 1),
        'Max Flood Days in a Single Year': int(ds['flood_days_per_year'][:, station_idx].max()),
        'Year of Max Flood Days': int(ds['year'][ds['flood_days_per_year'][:, station_idx].argmax()].item()),
        'Total Flood Hours': int(ds['flood_hours_per_year'][:, station_idx].sum()),
        'Average Flood Hours per Year': round(float(ds['flood_hours_per_year'][:, station_idx].mean()), 1),
        'Max Flood Hours in a Single Year': int(ds['flood_hours_per_year'][:, station_idx].max()),
        'Year of Max Flood Hours': int(ds['year'][ds['flood_hours_per_year'][:, station_idx].argmax()].item()),
        'Change in Flood Days per Year': round(float(ds['rate_change_days'][station_idx]), 1),
        'Change in Flood Hours per Year': round(float(ds['rate_change_hours'][station_idx]), 1  )
    }
    summary_stats_all.append(stats)

summary_stats_df_all = pd.DataFrame(summary_stats_all)
summary_stats_df_all

Make it pretty:

In [None]:
# Turn into a pretty table with great tables
from great_tables import GT, html, style, loc

PORstring = f'{POR_start.strftime("%Y")}-{POR_end.strftime("%Y")}'
floodDaysCols = [col for col in summary_stats_df_all.columns if 'Days' in col]
floodHoursCols = [col for col in summary_stats_df_all.columns if 'Hours' in col]

cols_label_dict = {
    col: html(col.replace("Flood Days", "").replace("Flood Hours", "").strip())
    for col in summary_stats_df_all.columns
}

cols_label_dict['Change in Flood Days per Year'] = html('Change <br> (days/yr)')
cols_label_dict['Change in Flood Hours per Year'] = html('Change <br> (hours/yr)')
cols_label_dict['Max Flood Days in a Single Year'] = html('Yearly <br>Max')
cols_label_dict['Max Flood Hours in a Single Year'] = html('Yearly <br>Max')
cols_label_dict['Average Flood Days per Year'] = html('Average <br>per Year')
cols_label_dict['Average Flood Hours per Year'] = html('Average <br>per Year')
cols_label_dict['Year of Max Flood Days'] = html('Year of <br>Max')
cols_label_dict['Year of Max Flood Hours'] = html('Year of <br>Max')
cols_label_dict['Station ID'] = html('ID')

# Create a Table object
table = (
    GT(summary_stats_df_all)
    .tab_options(table_font_size="13px")
    .cols_width(cases={"Station Name": "120px"})
    .cols_label(**cols_label_dict)
    .cols_align(align="left", columns = ['Threshold (cm)','Station ID'])
    .tab_spanner(
        label="Flood Days", columns=floodDaysCols)
    .tab_spanner(
        label="Flood Hours", columns=floodHoursCols)
    .tab_header(
        subtitle='Hawaiian Island Region',title='Minor Flood Statistics')        
    .tab_source_note(
         source_note = 'Data: NOAA CO-OPS Hourly Water Levels (' + PORstring + '), Threshold: ' + str(percentile) +'th percentile of daily max water level')
)

table

In [None]:
# Save the table as an HTML file
savepath = output_dir / 'minor_flood_statistics.html'
GT.write_raw_html(table, savepath)
# Save the table as a png file
savepath = output_dir / 'minor_flood_statistics.png'
table.save(savepath, scale=2)  

### Plot time series of all stations

In [None]:
fig, axs = plt.subplots(2,1, sharex=True, figsize=(6, 6))

# Set the color for each station
colors = sns.color_palette('Dark2', n_colors=len(ds['station_id']))

# Plot the data for each station with the same color
for i, station_id in enumerate(ds['station_id']):
    axs[0].plot(ds['year'], ds['flood_days_per_year'].sel(station_id=station_id), label=None, color=colors[i], linestyle=':')
    axs[1].plot(ds['year'], ds['flood_hours_per_year'].sel(station_id=station_id), label=None, color=colors[i], linestyle=':')

    # Plot the trend lines
    axs[0].plot(ds['year'], ds['trend_days'].sel(station_id=station_id), label=None , color=colors[i])
    axs[1].plot(ds['year'], ds['trend_hours'].sel(station_id=station_id), label=ds['station_name'].sel(station_id=station_id).values, color=colors[i])

# Set the labels and tick labels
axs[0].set_ylabel('Flood Days')
# axs[0].set_xticklabels([])

axs[1].set_xlabel('Year')
axs[1].set_ylabel('Flood Hours')

# add a legend 
axs[1].legend(ncol=3, columnspacing=0.5, bbox_to_anchor=(0, 2.25), loc='upper left',fontsize=5)

#color the legend text the same as the lines and then remove the lines
for i, station_id in enumerate(ds['station_id']):
    axs[1].get_legend().get_texts()[i].set_color(colors[i])
    axs[1].get_legend().get_texts()[i].set_fontsize('small')
    # axs[1].get_legend().get_lines()[i].set_linewidth(0)

#remove box around legend
axs[1].get_legend().get_frame().set_linewidth(0.0)    


### Create a Simple Table
Now we'll generate a table with this information, which will be saved as a .csv in the output directory specified at the top of this notebook.

In [None]:
import pandas as pd
# make a dataframe with the rate change in flood days and hours per year, with given threshold
rate_change_df = pd.DataFrame({'rate_change_days': ds['rate_change_days'].values, 'rate_change_hours': ds['rate_change_hours'].values, 'threshold': thresholds.round(2)})
rate_change_df = rate_change_df.round(2)
# add the station name and country
rate_change_df['station'] = ds['station_name'].values
rate_change_df['country'] = ds['station_country'].values

# reorder the columns
rate_change_df = rate_change_df[['station', 'threshold', 'rate_change_days', 'rate_change_hours']]

# Define your attributes
attributes = {
    'station': 'Station name',
    'threshold': 'Threshold in cm above MHHW',
    'rate_change_days': 'Rate of change in flood days per year',
    'rate_change_hours': 'Rate of change in flood hours per year'
}

# Open the file in write mode
with open(output_dir / 'SL_FloodFrequency_rate_change.csv', 'w') as f:
    # Write the attributes as comments
    for column, attribute in attributes.items():
        f.write(f'# {column}: {attribute}\n')

    # Write the DataFrame to the file
    rate_change_df.to_csv(f, index=False)

rate_change_df


And now we'll make that pretty:

In [None]:
#make a pretty pdf of the table with great_tables
from great_tables import GT, html
thresholdstr = str(percentile) + 'th Percentile'
# Create a Table object
table = (
    GT(rate_change_df[["station", "threshold", "rate_change_days", "rate_change_hours"]])
    .cols_label(
        station=html('Station'),
        threshold=html(f'Threshold<br>{thresholdstr}<br>(cm above MHHW)'),
        rate_change_days=html('Rate of Change<br>in Flood Days<br>(days/yr)'),
        rate_change_hours=html('Rate of Change<br>in Flood Hours<br>(hours/yr)')
    )
    .fmt_number(
        columns=["rate_change_days", "rate_change_hours"], decimals=1
    )
    .tab_header(
            title='Flood Frequency Analysis', subtitle='Hawaiian Island Region')
        .tab_source_note(
            source_note=html(
                f"Data: NOAA CO-OPS Hourly Water Levels. Threshold is calculated with data from {POR_start.strftime('%Y-%m-%d')} to {POR_end.strftime('%Y-%m-%d')} (relative to MHHW)"
            )
        )
)

table
output_path = output_dir / f'flood_frequency_table.png'
table.save(str(output_path), scale=2)

Finally, let's compare thresholds from the different sources:

In [None]:
# make a dataframe of thresholds from thresholds, thresholds_NOS, and thresholds_NWS
thresholds_dict = {station: round(threshold, 2) for station, threshold in zip(hourly_data['station_id'].values, thresholds)}

station_info = []
for station in hourly_data['station_id'].values:
    nos_val = threshold_nos.get(station, None)
    nws_val = threshold_nws.get(station, None)
    threshold_val = thresholds_dict.get(station, None)
    station_info.append({
        'station_id': station,
        'station_name': hourly_data['station_name'].sel(station_id=station).item(),
        'threshold_95th_percentile_cm': threshold_val,
        'threshold_NOS_minor_cm': round(nos_val, 2) if nos_val is not None else None,
        'threshold_NWS_minor_cm': round(nws_val, 2) if nws_val is not None else None
    })
thresholds_df = pd.DataFrame(station_info)

In [None]:
thresholds_df

## Plot a Map
And here is our final plotting code.

In [None]:
xlims = [-185, -152]
ylims = [14, 40]

crs = ccrs.PlateCarree()
fig, axs = plt.subplots(1, 2, figsize=(10, 6), subplot_kw={'projection': crs})

rsl['lon_west'] = -(360 - rsl['lon'])

# Colormap setup
cmap_days = plt.get_cmap('YlOrRd')
cmap_hours = plt.get_cmap('YlOrRd')
norm_days = plt.Normalize(ds['slope_days'].min().values, ds['slope_days'].max().values)
norm_hours = plt.Normalize(ds['slope_hours'].min().values, ds['slope_hours'].max().values)

for i, ax in enumerate(axs):
    ax.set_xlim(xlims)
    ax.set_ylim(ylims)
    ax.coastlines()
    ax.add_feature(cfeature.LAND, color='lightgrey')
    ax.text(0.95, 0.95, f'({chr(97 + i)})',
            horizontalalignment='right', verticalalignment='top', transform=ax.transAxes,
            fontsize=16)
    # ax.add_feature(cfeature.OCEAN, color='lightblue')
    gl = ax.gridlines(draw_labels=False, linestyle=':', color='black',
                      alpha=0.2, xlocs=ax.get_xticks(), ylocs=ax.get_yticks())
    gl.top_labels = False
    gl.right_labels = False
    if ax == axs[1]:
        gl.left_labels = False

# Plot colored dots for flood days slope
sc_days = axs[0].scatter(
    rsl['lon_west'], rsl['lat'],
    c=ds['slope_days'].values, alpha=0.7,
    cmap=cmap_days, norm=norm_days, s=120, edgecolor='black', transform=crs, zorder=3
)
sc_hours = axs[1].scatter(
    rsl['lon_west'], rsl['lat'],
    c=ds['slope_hours'].values, alpha=0.7,
    cmap=cmap_hours, norm=norm_hours, s=120, edgecolor='black', transform=crs, zorder=3
)

axs[0].set_title('Change in Flood Days')
axs[1].set_title('Change in Flood Hours')

# Add colorbars with custom axes to match figure height and add padding between them
axs_bottom = axs[0].get_position().y0
axs0_x0 = axs[0].get_position().x0
axs0_width = axs[0].get_position().width * 0.9  # slightly shrink width
axs1_x0 = axs[1].get_position().x0
axs1_width = axs[1].get_position().width * 0.9  # slightly shrink width

padding = 0.1  # horizontal gap between colorbars

cbar_ax_days = fig.add_axes([axs0_x0+padding, 0.6, axs0_width-padding, 0.02])  # [left, bottom, width, height]
cbar_ax_hours = fig.add_axes([axs1_x0 + padding, 0.6, axs1_width-padding, 0.02])


cb_days = fig.colorbar(sc_days, cax=cbar_ax_days, label='(days/year)', orientation='horizontal')
cb_hours = fig.colorbar(sc_hours, cax=cbar_ax_hours, label='(hours/year)', orientation='horizontal')

cb_days.ax.xaxis.set_ticks_position('top')
cb_days.ax.xaxis.set_label_position('top')

cb_hours.ax.xaxis.set_ticks_position('top')
cb_hours.ax.xaxis.set_label_position('top')

glue("mag_fig", fig, display=False)
output_file_path = output_dir / 'SL_FloodFrequency_map.png'
fig.savefig(output_file_path, dpi=300, bbox_inches='tight')


```{glue:figure} mag_fig
:name: "mag_fig"

Map of the rate of change in average flood (a) days and (b) hours per year above the {glue:text}`threshold_percentile:.0f`th threshold of water levels at at {glue:}`SL_Data_Wrangling.ipynb::station_group` tide gauges from {glue:text}`startPORDateTime` to {glue:text}`endPORDateTime`
```

---

## Citations

```{bibliography}
:style: plain
:filter: docname in docnames
```