# Summarize extreme datasets

This notebook is for creating summaries of the extreme variables derived from the CORDEX data. This will involve spreadsheets containing summarized data as well as some simple summary graphics.

### Eras

The eras of interest for summarizing these extremes will be:

* 2011-2040
* 2041-2070
* 2071-2100




In [1]:
import os
from pathlib import Path


# write to the final_products/auxiliary_content directory, as the outputs from here will be used for reporting
out_dir = Path(os.getenv("OUTPUT_DIR") or "/workspace/Shared/Tech_Projects/TBEC_CMIP5_Processing/final_products/")
extremes_fp = out_dir.joinpath("annual_extremes.nc")
extr_summary_fp = out_dir.joinpath("extremes_extractions.xlsx")

Open connection to the extremes dataset:

In [2]:
import xarray as xr


ds = xr.open_dataset(extremes_fp)

## Excel spreadsheet

Create an excel spreadsheet of tidy tables of summarized extreme variables, where each worksheet is one of the study locations. Each tidy table will have the following headers:

* `model`
* `scenario`
* `era` (time period)
* `aggr` (aggregate variable)
* `variable`

Create some iterables for extracting the various summaries:

In [3]:
eras = [
    # "1981-2010",
    "2011-2040",
    "2041-2070",
    "2071-2100",
]
# only need to iterate over these two scenarios, since 
#  "historical" era actually contains 5 years from 
#  either of the future scenarios
scenarios = ["rcp45", "rcp85"]

# summary variables
# aggr_var_lu = {
#     "min": np.min,
#     "mean": np.mean,
#     "max": np.max,
# }

varnames = ["rx1say", "hsd", "hd", "cd"]

# dict of WGS84 coords for each of the locations
locations = {
    "Kaktovik": (70.1, -143.6),
    "Stevens Village": (66.1, -149.1),
    "Igiugik Village": (59.3, -155.9),
    "Levelock": (59.1, -156.9),
    # "Nelson Lagoon": (55.9, -161.2),
    "Eyak": (60.5, -145.6),
    "Ketchikan": (55.6, -136.6),
    # "Unalaska": (53.9, -166.5),
    "Aleutians": (57.838, -159.995),
}

Create an excel dataset writer object:

In [4]:
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)
warnings.simplefilter(action='ignore', category=RuntimeWarning)
import pandas as pd


writer = pd.ExcelWriter(extr_summary_fp, engine="xlsxwriter")

Iterate! Iterate! Loop over all possibilities and populate the excel sheet, inefficently but straightforwardly:

In [5]:
import numpy as np


dfs = []
for location in locations:
    lat, lon = locations[location]
    
    df_rows = []
    
    for era in eras:
        start_year, end_year = era.split("-")
        for model in ds.model.values:
            for scenario in scenarios:
                for varname in varnames:
                    da = ds[varname].sel(
                        model=model,
                        scenario=scenario,
                        year=slice(int(start_year), int(end_year))
                    ).sel(
                        lat=lat,
                        lon=lon,
                        method="nearest"
                    )
                    # for aggr_var in aggr_var_lu:

                    df_rows.append({
                        "model": model,
                        "scenario": scenario,
                        "era": era,
                        "varname": varname,
                        "min": np.nanmin(da.values).round(1),
                        "mean": np.nanmean(da.values).round(1),
                        "max": np.nanmax(da.values).round(1),
                    })
    
    # create dataframe write dataframe to a sheet in the excel file   
    df = pd.DataFrame(df_rows)
    dfs.append(df)
    df.to_excel(writer, sheet_name=location, index=False)
    print(f"{location} done")

Kaktovik done
Stevens Village done
Igiugik Village done
Levelock done
Eyak done
Ketchikan done
Aleutians done


Save the Excel spreadsheet:

In [6]:
writer.save()