# AK spruce beetle outbreak risk pipeline

This notebook constitutes the pipeline for producing a dataset of projected climate-driven risk of spruce beetle outbreak for forested areas of Alaska for the 21st century. See the [README](README.md) for more information.

### Outputs

The main product of this pipeline is a 5-D datacube of one categorical variable - climate-driven spruce beetle outbreak risk. The dimensions are:  

* Era (time period)
* Model
* Scenario
* Snowpack level
* Y
* X

##### Format / structure

This will be realized in typical SNAP / ARDAC fashion: a set of GeoTIFFs containing risk values for the entire spatial domain for a single realization of the first four dimension values, i.e. coordinates, and named according to those unique coordinate combinations.

##### Spatial extent

The expected spatial extent of the final dataset is the extent of the forest layer that the final risk data will be masked to. This will come from a version of the binary USFS "Alaska Forest/Non-forest Map" raster (found [here](https://data.fs.usda.gov/geodata/rastergateway/biomass/alaska_forest_nonforest.php)) in SNAP holdings that has been reprojected to EPSG:3338, found at `/workspace/Shared/Tech_Projects/beetles/project_data/ak_forest_mask.tif`.

##### Temporal extent

The risk values will be computed for 30-year long eras of the 21st century:  
* 2010-2039
* 2040-2069
* 2070-2099

### Base data

The base / input data used for computing the climate-driven risk of beetle outbreaks is the "[21st Century Hydrologic Projections for Alaska and Hawaii](https://www.earthsystemgrid.org/dataset/ucar.ral.hydro.predictions.html)" dataset produced by NCAR, specifically the "Alaska Near Surface Meteorology Daily Averages" child dataset. This dataset is available on SNAP infra at `/Data/Base_Data/Climate/AK_NCAR_12km/met`.

## Pipeline steps

0. Setup - Set up path variables, slurm variables, directories, intial conditions, etc. Execute the setup code cell before any other step
1. Process yearly risk components - the risk model

### 0 - Setup

Set up path variables, slurm variables, directories, intial conditions, etc.:

In [7]:
import os
from pathlib import Path
import compute_yearly_risk as main
import slurm


ncar_dir = Path(os.getenv("AK_NCAR_DIR"))
base_dir = Path(os.getenv("BASE_DIR"))
output_dir = Path(os.getenv("OUTPUT_DIR"))
scratch_dir = Path(os.getenv("SCRATCH_DIR"))
conda_init_script = Path(os.getenv("CONDA_INIT"))
project_dir = Path(os.getenv("PROJECT_DIR"))
# binary directory from the current conda environment is appended to PATH
path_str = os.getenv("PATH")
# can use this to activate the anaconda-project env
ap_env = Path(path_str.split(":")[0]).parent

# met_dir = Path("/Data/Base_Data/Climate/AK_NCAR_12km/met")
# path to the input meteorological dataset
met_dir = ncar_dir.joinpath("met")

tmp_fn = "{}_{}_BCSD_met_{}.nc4"

# path to directory where risk components datasets will be written
risk_comp_dir = scratch_dir.joinpath("risk_components")
risk_comp_dir.mkdir(exist_ok=True)

# path to directory where yearly risk datasets will be written
yearly_risk_dir = scratch_dir.joinpath("yearly_risk")
yearly_risk_dir.mkdir(exist_ok=True)

# daymet_comp_fp = scratch_dir.joinpath("yearly_risk_components_daymet.nc")
# path to directory where slurm scripts (jobs and outputs) will be written
slurm_dir = scratch_dir.joinpath("slurm")
slurm_dir.mkdir(exist_ok=True, parents=True)



# 
# risk_script = "/workspace/UA/kmredilla/spruce-beetle-risk/compute_yearly_risk.py"
risk_script = project_dir.joinpath("compute_yearly_risk.py")

# init script for conda on compute nodes

slurm_email = "kmredilla@alaska.edu"
partition = "main"
# conda_init_script = "/home/UA/kmredilla/conda_init.sh"
# conda_env_name = "py39"
ncpus = 32

## Process Daymet yearly risk components dataset

Create a dataset of the yearly risk component values as a useful precursor to the synthesized overall risk value.

In [1]:
dm_tmp_fn = "{}_met_{}.nc"
dm_era = "1980-2017"
dm_model = "daymet"
dm_ncpus = 32

In [3]:
daymet_risk_comp_ds = main.process_risk_components(
    met_dir, dm_tmp_fn, dm_era, dm_model, dm_ncpus
)

In [4]:
daymet_comp_fp = scratch_dir.joinpath("yearly_risk_components_daymet.nc")

daymet_risk_comp_ds.to_netcdf(daymet_comp_fp)

## Process Daymet yearly risk dataset

Derive the yearly risk dataset from the components dataset.

We are working with 1980-2017, and a single year's risk computation requires two years' worth of data prior. So our first valid year will be 1982.

In [5]:
import numpy as np
import xarray as xr


const_args = {"model": "daymet", "scenario": None}
snow_values = ["low", "med", "high"]

yearly_risk_arrs = []

with xr.open_dataset(daymet_comp_fp) as comp_ds:
    for year in range(1982, 2018):
        u_t2 = comp_ds["summer_survival"].sel(year=(year - 2), **const_args).values
        u_t1 = comp_ds["summer_survival"].sel(year=(year - 1), **const_args).values
        # "not univoltine"
        un_t2 = np.round(1 - u_t2, 2)
        x2_t2 = comp_ds["fall_survival"].sel(year=(year - 2), **const_args).values
        x2_t1 = comp_ds["fall_survival"].sel(year=(year - 1), **const_args).values

        year_snow_risk = []
        for snow in snow_values:
            x3_t2 = comp_ds["winter_survival"].sel(year=(year - 2), snow=snow, **const_args).values
            x3_t1 = comp_ds["winter_survival"].sel(year=(year - 1), snow=snow, **const_args).values

            # original equation
            # (un_t2 * sv_p * x2_t2 * x2_t1 * x3_t2 * x3_t1) + ((u_t2 * p * x2_t2 * x3_t2) * (u_t1 * p * x2_t1 * x3_t1)) + (u_t2 * p * x2_t2 * x2_t1 * x3_t2 * x3_t1)
            # simplified algebra
            year_snow_risk.append(main.compute_risk(u_t1, u_t2, un_t2, x2_t1, x2_t2, x3_t1, x3_t2))

        yearly_risk_arrs.append(np.array(year_snow_risk))
    
yearly_risk_arr = np.swapaxes(np.array(yearly_risk_arrs), 0, 1)

In [6]:
import xarray as xr


daymet_risk_ds = xr.Dataset(
    # need to expand dims to add an extra for each of model, scenario
    data_vars={"risk": (["snow", "year", "y", "x"], yearly_risk_arr)},
    coords={
        "year": (["year"], np.arange(1982, 2018)),
        "longitude": (["y", "x"], comp_ds["longitude"].values),
        "latitude": (["y", "x"], comp_ds["latitude"].values),
        "snow": (["snow"], snow_values),
    },
    attrs=dict(description="Climate-based beetle risk",),
)

In [11]:
daymet_risk_fp = "/workspace/Shared/Tech_Projects/beetles/final_products/yearly_risk_daymet.nc"

daymet_risk_ds.to_netcdf(daymet_risk_fp)

## Process CMIP5 yearly risk components dataset

Create a dataset of the yearly risk component values as a useful precursor to the synthesized overall risk value for a single CMIP5 model.

Do this for a far future period, for RCP85, for GFDL-ESM2M.

In [2]:
era = "2068-2099"
model = "HadGEM2-ES"
scenario = "rcp85"

In [9]:
era = "2068-2099"
model = "GFDL-ESM2M"
scenario = "rcp85"

In [3]:
risk_comp_ds = main.process_risk_components(
    met_dir, tmp_fn, era, model, ncpus, scenario
)

In [4]:
comp_fp = scratch_dir.joinpath(f"yearly_risk_components_{model}_{scenario}.nc")

risk_comp_ds.to_netcdf(comp_fp)

## Process CMIP5 yearly risk dataset

Derive the yearly risk dataset from the components dataset.

In [5]:
import numpy as np
import xarray as xr


const_args = {"model": model, "scenario": scenario}
snow_values = ["low", "med", "high"]

yearly_risk_arrs = []

with xr.open_dataset(comp_fp) as comp_ds:
    for year in range(2070, 2100):
        u_t2 = comp_ds["summer_survival"].sel(year=(year - 2), **const_args).values
        u_t1 = comp_ds["summer_survival"].sel(year=(year - 1), **const_args).values
        # "not univoltine"
        un_t2 = np.round(1 - u_t2, 2)
        x2_t2 = comp_ds["fall_survival"].sel(year=(year - 2), **const_args).values
        x2_t1 = comp_ds["fall_survival"].sel(year=(year - 1), **const_args).values

        year_snow_risk = []
        for snow in snow_values:
            x3_t2 = comp_ds["winter_survival"].sel(year=(year - 2), snow=snow, **const_args).values
            x3_t1 = comp_ds["winter_survival"].sel(year=(year - 1), snow=snow, **const_args).values

            # original equation
            year_snow_risk.append(main.compute_risk(u_t1, u_t2, un_t2, x2_t1, x2_t2, x3_t1, x3_t2))

        yearly_risk_arrs.append(np.array(year_snow_risk))
    
yearly_risk_arr = np.swapaxes(np.array(yearly_risk_arrs), 0, 1)

In [6]:
import xarray as xr


risk_ds = xr.Dataset(
    # need to expand dims to add an extra for each of model, scenario
    data_vars={"risk": (["snow", "year", "y", "x"], yearly_risk_arr)},
    coords={
        "year": (["year"], np.arange(2070, 2100)),
        "longitude": (["y", "x"], comp_ds["longitude"].values),
        "latitude": (["y", "x"], comp_ds["latitude"].values),
        "snow": (["snow"], snow_values),
    },
    attrs=dict(description="Climate-based beetle risk",),
)

In [7]:
risk_fp = f"/workspace/Shared/Tech_Projects/beetles/final_products/yearly_risk_{model}_{scenario}.nc"

risk_ds.to_netcdf(risk_fp)

## 1 - Process yearly risk

This section creates the yearly risk dataset - a collection of risk values for each year across the grid. This dataset is not expected to be the final product, but it could be a useful intermediate product.

### 1.1 - Process yearly risk components

The yearly "risk components" dataset may have some merit on its own, at least for validation if nothing else. This is the three main variables derived from a year's climate data which are necessary for calculating the risk for any given year.

In [13]:
import importlib

importlib.reload(slurm)

<module 'slurm' from '/workspace/UA/kmredilla/spruce-beetle-risk/slurm.py'>

In [7]:
rm /atlas_scratch/kmredilla/beetles/slurm/*

In [9]:
# all projections will have years 2010-2099
# need to start with 2008 as yearly risk calculation
#   requires risk components from two years prior
era = "2008-2099"

In [14]:

model = "GFDL-ESM2M"
scenario = "rcp85"
sbatch_fp, sbatch_out_fp = slurm.get_yearly_fps(slurm_dir, model, era, scenario)
risk_comp_fp = risk_comp_dir.joinpath(f"risk_components_{model}_{scenario}_{era}.nc")
yearly_risk_fp = yearly_risk_dir.joinpath(f"yearly_risk_{model}_{scenario}_{era}.nc")



In [10]:
import luts


sbatch_fps = []
for model in luts.models:
    for scenario in luts.scenarios:
        sbatch_fp, sbatch_out_fp = slurm.get_yearly_fps(slurm_dir, model, era, scenario)
        risk_comp_fp = risk_comp_dir.joinpath(f"risk_components_{model}_{scenario}_{era}.nc")
        yearly_risk_fp = yearly_risk_dir.joinpath(f"yearly_risk_{model}_{scenario}_{era}.nc")
        
        kwargs = {
            "slurm_email": slurm_email,
            "partition": partition,
            "conda_init_script": conda_init_script,
            "ap_env": ap_env,
            "sbatch_fp": sbatch_fp,
            "sbatch_out_fp": sbatch_out_fp,
            "risk_script": risk_script,
            "met_dir": met_dir,
            "tmp_fn": tmp_fn,
            "risk_comp_fp": risk_comp_fp,
            "yearly_risk_fp": yearly_risk_fp,
            "era": era,
            "model": model,
            "scenario": scenario,
        }

        slurm.write_sbatch_yearly_risk(**kwargs)
        sbatch_fps.append(sbatch_fp)

Remove existing slurm output files if desired:

In [12]:
# remove existing output files if desired
_ = [fp.unlink() for fp in slurm_dir.glob("*.out")]

Submit the sbatch jobs:

In [14]:
job_ids = [slurm.submit_sbatch(fp) for fp in sbatch_fps]

## Process CMIP5 yearly risk dataset

Create a slurm sbatch script for each model, scenario, and era combination. Each job will occupy a node to read the data in parallel and return a risk dataset. Define a function to create this script:

In [2]:
def write_sbatch_yearly_risk(
    slurm_email,
    partition,
    conta_init_script,
    conda_env_name,
    sbatch_fp,
    sbatch_out_fp,
    compute_yearly_risk,
    met_dir,
    tmp_fn,
    era,
    model,
    scenario,
    ncpus,
    risk_fp
):
    sbatch_head = (
        "#!/bin/sh\n"
        "#SBATCH --nodes=1\n"
        "#SBATCH --cpus-per-task={}\n"
        "#SBATCH --mail-type=FAIL\n"
        f"#SBATCH --mail-user={slurm_email}\n"
        f"#SBATCH -p {partition}\n"
        "#SBATCH --output {}\n"
        # print start time
        "echo Start slurm && date\n"
        # prepare shell for using activate - Chinook requirement
        f"source {conda_init_script}\n"
        f"conda activate {conda_env_name}\n"
    )

    pycommands = "\n"
    pycommands += (
        f"python {compute_yearly_risk} "
        f"--met_dir {met_dir} "
        f"--tmp_fn {tmp_fn} "
        f"--era {era} "
        f"--model {model} "
        f"--scenario {scenario} "
        f"--ncpus {ncpus} "
        f"--risk_fp {risk_fp}\n\n"
    )
    commands = sbatch_head.format(ncpus, sbatch_out_fp) + pycommands

    with open(sbatch_fp, "w") as f:
        f.write(commands)


Build sbatch files:

In [4]:
import luts


sbatch_fps = []
risk_fps = []
for model in luts.models:
    for scenario in luts.scenarios:
        for era in luts.eras:
            if era in ["2040-2099"]:
                continue
            sbatch_fp = slurm_dir.joinpath(
                f"yearly_risk_{model}_{scenario}_{era}.slurm"
            )
            sbatch_out_fp = str(sbatch_fp).replace(".slurm", "_%j.out")
            # temporary filepath for yearly data array
            risk_fp = scratch_dir.joinpath(f"{model}_{scenario}_{era}.nc")
            write_sbatch_yearly_risk(
                slurm_email,
                partition,
                conda_init_script,
                conda_env_name,
                sbatch_fp,
                sbatch_out_fp,
                compute_yearly_risk,
                met_dir,
                tmp_fn,
                era,
                model,
                scenario,
                ncpus,
                risk_fp
            )
            sbatch_fps.append(sbatch_fp)
            risk_fps.append(risk_fp)

In [39]:
# remove existing output files if desired
_ = [fp.unlink() for fp in slurm_dir.glob("*.out")]

In [5]:
import subprocess


def submit_sbatch(sbatch_fp):
    """Submit a script to slurm via sbatch
    
    Args:
        sbatch_fp (pathlib.PosixPath): path to .slurm script to submit
        
    Returns:
        job id for submitted job
    """
    out = subprocess.check_output(["sbatch", str(sbatch_fp)])
    job_id = out.decode().replace("\n", "").split(" ")[-1]

    return job_id

Submit the sbatch jobs:

In [6]:
job_ids = [submit_sbatch(fp) for fp in sbatch_fps]

Read in all temporary DataArrays and combine:

In [10]:
import xarray as xr


risk_da = xr.combine_by_coords([xr.open_dataarray(fp) for fp in risk_fps])

Save to a single file on Poseidon:

In [12]:
out_fp = "/workspace/Shared/Tech_Projects/beetles/final_products/yearly_risk.nc"

risk_da.to_netcdf(out_fp)