# Extract point locations

This notebook is for extracting and summarizing the indices derived from the CORDEX data at specified point locations specified in `config.py`. We will store these data extractions in CSV files for ease of use, where each CSV contains a tidy table of extracted aggregations over three summary eras (listed below) for each location, as well as over all available decades for the CORDEX data.

Each tidy table / CSV will have the following columns:

* `model`
* `scenario`
* `era` / `decade`
* `index` (index variable)
* `min` (minimum value over era/decade)
* `mean` (mean value over era/decade)
* `max` (maximum value over era/decade)

#### Summary eras

The eras of interest for summarizing these extremes will be:

* 2011-2040
* 2041-2070
* 2071-2100

#### Summary decades

We will also summarize the data over all available decades, from 1980-1989 to 2090-2099.

#### Excel spreadsheet

For ease of sharing these extractions with collaborators, this notebook will also create an excel spreadsheet (`.xlsx` format) from those tidy tables of summarized indices saved to `.csv` files, with worksheet being one of the study locations. We will just populate this excel file at the same time we create the `.csv` files.

## Run the extraction

Run the cell below first to set up paths and set up the environment:

In [1]:
from config import *
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)
warnings.simplefilter(action='ignore', category=RuntimeWarning)

Open connection to the indices dataset, which will be used for both era and decadal sumamries:

In [6]:
ds = xr.open_dataset(indices_fp)

### Era summaries

Extract and summarize the indices over the future eras. Define the eras to summarize over:

In [11]:
eras = [
    "2011-2040",
    "2041-2070",
    "2071-2100",
]

# list of the index variable names available from config
index_list = [name for index_list in idx_varname_lu.values() for name in index_list]

Create an excel dataset writer object:

In [3]:
writer = pd.ExcelWriter(idx_era_summary_fp, engine="openpyxl")

Iterate! Iterate! Loop over all possibilities and populate the excel sheet, inefficently but straightforwardly:

In [12]:
dfs = []
for location in locations:
    lat, lon = locations[location]
    
    df_rows = []
    
    for era in eras:
        start_year, end_year = era.split("-")
        for model in ds.model.values:
            for scenario in scenarios:
                for index in index_list:
                    da = ds[index].sel(
                        model=model,
                        scenario=scenario,
                        year=slice(int(start_year), int(end_year))
                    ).sel(
                        lat=lat,
                        lon=lon,
                        method="nearest"
                    )
                    # for aggr_var in aggr_var_lu:

                    df_rows.append({
                        "model": model,
                        "scenario": scenario,
                        "era": era,
                        "index": index,
                        "min": np.nanmin(da.values).round(1),
                        "mean": np.nanmean(da.values).round(1),
                        "max": np.nanmax(da.values).round(1),
                    })
    
    # create dataframe write dataframe to a sheet in the excel file   
    df = pd.DataFrame(df_rows)
    dfs.append(df)
    out_fp = idx_era_summary_dir.joinpath(f"era_summaries_{location}.csv")
    df.to_csv(out_fp, index=False)
    df.to_excel(writer, sheet_name=location, index=False)
    print(f"{location} done")

Kaktovik done
Stevens Village done
Igiugik Village done
Levelock done
Eyak done
Ketchikan done
Aleutians done


Save the Excel spreadsheet:

In [13]:
writer.save()

### Decdal summaries

Now do the same as above, but for the decades. Define the decades:

In [14]:
decades = [f"{year}-{year + 9}" for year in np.arange(1980, 2091, 10)]

Open Excel writer object:

In [15]:
decade_writer = pd.ExcelWriter(idx_decade_summary_fp, engine="openpyxl")

Define functions for summarizing the data from the point-extracted values:

In [16]:
def subset_data(ds, varname, model, scenario, year_sl, lat, lon):
    """Subset an xarray dataset"""
    da = ds[varname].sel(
        model=model,
        scenario=scenario,
        year=year_sl
    ).sel(
        lat=lat,
        lon=lon,
        method="nearest"
    )
    return da
    
def summarize_to_row(da, model, scenario, decade, index):
    """Summarize a data array and return a dict in format for
    appending as pandas dataframe row to summary table
    """
    row_di = {
        "model": model,
        "scenario": scenario,
        "decade": decade,
        "index": index,
        "min": np.nanmin(da.values).round(1),
        "mean": np.nanmean(da.values).round(1),
        "max": np.nanmax(da.values).round(1),
    }
    return row_di

We have a bit of a wrinkle with the historical decades here being separate from the future scenarios: two decades are completely "historical" scenario, and the 2000-2009 decade overlaps the boundary between historical and future. For the historical decades, we will just save those separately as we will for the future decades. For the special decade though, we will summarize for the future scenarios separately, even though both will be based on data that have about 5 years of overlap.

In [18]:
dfs = []
for location in locations:
    lat, lon = locations[location]
    
    df_rows = []
    
    for decade in decades:
        start_year, end_year = decade.split("-")
        year_sl = slice(int(start_year), int(end_year))
        for index in index_list:
            if decade in ["1980-1989", "1990-1999"]:
                # this will be the historical scenario
                scenario = "hist"
                for model in ds.model.values:
                    da = subset_data(ds, index, model, scenario, year_sl, lat, lon)
                    df_rows.append(summarize_to_row(da, model, scenario, decade, index))

            elif decade == "2000-2009":
                # mixed decade, do both and concatenate data arrays
                hist_sl = slice(2000, 2005)
                future_sl = slice(2006, 2009)
                for model in ds.model.values:
                    for scenario in scenarios:
                        hist_da = subset_data(ds, index, model, "hist", year_sl, lat, lon)
                        future_da = subset_data(ds, index, model, scenario, year_sl, lat, lon)
                        da = xr.concat([hist_da, future_da], dim="year")
                        df_rows.append(summarize_to_row(da, model, scenario, decade, index))

            else:
                # future scenarios
                for model in ds.model.values:
                    for scenario in scenarios:
                        da = subset_data(ds, index, model, scenario, year_sl, lat, lon)
                        df_rows.append(summarize_to_row(da, model, scenario, decade, index))

    # create dataframe write dataframe to a sheet in the excel file   
    df = pd.DataFrame(df_rows)
    dfs.append(df)
    out_fp = idx_decade_summary_dir.joinpath(f"decade_summaries_{location}.csv")
    df.to_csv(out_fp, index=False)
    df.to_excel(decade_writer, sheet_name=location, index=False)
    print(f"{location} done")

Kaktovik done
Stevens Village done
Igiugik Village done
Levelock done
Eyak done
Ketchikan done
Aleutians done


Save and close connection to `ds`:

In [19]:
decade_writer.save()
ds.close()

##