# Extract point locations

This notebook is for extracting and summarizing the extreme variables derived from the CORDEX data at specified point locations. We will store these data extractions in an Excel file (`.xlsx`) where each spreadsheet contains a tidy table of extracted aggregations over three summary eras (listed below) for each location, as well as over all available decades

#### Summary eras

The eras of interest for summarizing these extremes will be:

* 2011-2040
* 2041-2070
* 2071-2100

#### Summary decades

We will also summarize the data over all available decades, from 1980-1989 to 2090-2099.

#### Excel spreadsheet

This notebook will create an excel spreadsheet of tidy tables of summarized extreme variables, where each worksheet is one of the study locations. An excel spreadhseet is chosen instead of a CSV or other format because these may be of use to collaborators. Each tidy table will have the following headers:

* `model`
* `scenario`
* `era` (time period)
* `aggr` (aggregate variable)
* `variable`

## Run the extraction

Run the cell below first to set up paths and set up the environment:

In [13]:
from config import *
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)
warnings.simplefilter(action='ignore', category=RuntimeWarning)

Read in the extremes dataset, which will be used for both era and decadal sumamries:

In [6]:
ds = xr.load_dataset(extremes_fp)

### Era summaries

Extract the data and summarize over the eras. Define some iterables for extracting the various summaries:

In [16]:
eras = [
    # "1981-2010",
    "2011-2040",
    "2041-2070",
    "2071-2100",
]

# only need to iterate over these two scenarios, since 
#  "historical" era actually contains 5 years from 
#  either of the future scenarios
scenarios = ["rcp45", "rcp85"]

# summary variables
# aggr_var_lu = {
#     "min": np.min,
#     "mean": np.mean,
#     "max": np.max,
# }

varnames = ["rx1say", "hsd", "hd", "cd"]

# dict of WGS84 coords for each of the locations
locations = {
    "Kaktovik": (70.1, -143.6),
    "Stevens Village": (66.1, -149.1),
    "Igiugik Village": (59.3, -155.9),
    "Levelock": (59.1, -156.9),
    # "Nelson Lagoon": (55.9, -161.2),
    "Eyak": (60.5, -145.6),
    "Ketchikan": (55.6, -136.6),
    # "Unalaska": (53.9, -166.5),
    "Aleutians": (57.838, -159.995),
}

Create an excel dataset writer object:

In [4]:
writer = pd.ExcelWriter(extr_era_summary_fp, engine="openpyxl")

Iterate! Iterate! Loop over all possibilities and populate the excel sheet, inefficently but straightforwardly:

In [5]:
dfs = []
for location in locations:
    lat, lon = locations[location]
    
    df_rows = []
    
    for era in eras:
        start_year, end_year = era.split("-")
        for model in ds.model.values:
            for scenario in scenarios:
                for varname in varnames:
                    da = ds[varname].sel(
                        model=model,
                        scenario=scenario,
                        year=slice(int(start_year), int(end_year))
                    ).sel(
                        lat=lat,
                        lon=lon,
                        method="nearest"
                    )
                    # for aggr_var in aggr_var_lu:

                    df_rows.append({
                        "model": model,
                        "scenario": scenario,
                        "era": era,
                        "varname": varname,
                        "min": np.nanmin(da.values).round(1),
                        "mean": np.nanmean(da.values).round(1),
                        "max": np.nanmax(da.values).round(1),
                    })
    
    # create dataframe write dataframe to a sheet in the excel file   
    df = pd.DataFrame(df_rows)
    dfs.append(df)
    df.to_excel(writer, sheet_name=location, index=False)
    print(f"{location} done")

Kaktovik done
Stevens Village done
Igiugik Village done
Levelock done
Eyak done
Ketchikan done
Aleutians done


Save the Excel spreadsheet:

In [6]:
writer.save()

### Decdal summaries

Now do the same as above, but for the decades. Define the decades:

In [19]:
decades = [f"{year}-{year + 9}" for year in np.arange(1980, 2091, 10)]

Open Excel writer object:

In [79]:
decade_writer = pd.ExcelWriter(extr_decade_summary_fp, engine="openpyxl")

Iterate!

In [41]:
def subset_data(ds, varname, model, scenario, year_sl, lat, lon):
    """Subset an xarray dataset"""
    da = ds[varname].sel(
        model=model,
        scenario=scenario,
        year=year_sl
    ).sel(
        lat=lat,
        lon=lon,
        method="nearest"
    )
    return da
    
def summarize_to_row(da, model, scenario, decade, varname):
    """Summarize a data array and return a dict in format for
    appending as pandas dataframe row to summary table
    """
    row_di = {
        "model": model,
        "scenario": scenario,
        "decade": decade,
        "varname": varname,
        "min": np.nanmin(da.values).round(1),
        "mean": np.nanmean(da.values).round(1),
        "max": np.nanmax(da.values).round(1),
    }
    return row_di

We have a bit of a wrinkle with the historical decades here being separate from the future scenarios: two decades are completely "historical" scenario, and the 2000-2009 decade overlaps the boundary between historical and future. For the historical decades, we will just save those separately as we will for the future decades. For the special decade though, we will summarize for the future scenarios separately, even though both will be based on data that have about 5 years of overlap.

In [69]:
dfs = []
for location in locations:
    lat, lon = locations[location]
    
    df_rows = []
    
    for decade in decades:
        start_year, end_year = decade.split("-")
        year_sl = slice(int(start_year), int(end_year))
        for varname in varnames:
            if decade in ["1980-1989", "1990-1999"]:
                # this will be the historical scenario
                scenario = "hist"
                for model in ds.model.values:
                    da = subset_data(ds, varname, model, scenario, year_sl, lat, lon)
                    df_rows.append(summarize_to_row(da, model, scenario, decade, varname))

            elif decade == "2000-2009":
                # mixed decade, do both and concatenate data arrays
                hist_sl = slice(2000, 2005)
                future_sl = slice(2006, 2009)
                for model in ds.model.values:
                    for scenario in scenarios:
                        hist_da = subset_data(ds, varname, model, "hist", year_sl, lat, lon)
                        future_da = subset_data(ds, varname, model, scenario, year_sl, lat, lon)
                        da = xr.concat([hist_da, future_da], dim="year")
                        df_rows.append(summarize_to_row(da, model, scenario, decade, varname))

            else:
                # future scenarios
                for model in ds.model.values:
                    for scenario in scenarios:
                        da = subset_data(ds, varname, model, scenario, year_sl, lat, lon)
                        df_rows.append(summarize_to_row(da, model, scenario, decade, varname))

    # create dataframe write dataframe to a sheet in the excel file   
    df = pd.DataFrame(df_rows)
    dfs.append(df)
    df.to_excel(decade_writer, sheet_name=location, index=False)
    
    worksheet = decade_writer.sheets[location]
    
    worksheet.set_column(4, 6, None, format1) 
    print(f"{location} done")

Kaktovik done
Stevens Village done
Igiugik Village done
Levelock done
Eyak done
Ketchikan done
Aleutians done


In [70]:
decade_writer.save()

##