# AK spruce beetle outbreak risk pipeline

This notebook constitutes the pipeline for producing a dataset of projected climate-driven risk of spruce beetle outbreak for forested areas of Alaska for the 21st century. See the [README](README.md) for more information.

### Outputs

The main product of this pipeline is a 5-D datacube of one categorical variable - climate-driven spruce beetle outbreak risk. The dimensions are:  

* Era (time period)
* Model
* Scenario
* Snowpack level
* Y
* X

##### Format / structure

This will be realized in typical SNAP / ARDAC fashion: a set of GeoTIFFs containing risk values for the entire spatial domain for a single realization of the first four dimension values, i.e. coordinates, and named according to those unique coordinate combinations.

##### Spatial extent

The expected spatial extent of the final dataset is the extent of the forest layer that the final risk data will be masked to. This will come from a version of the binary USFS "Alaska Forest/Non-forest Map" raster (found [here](https://data.fs.usda.gov/geodata/rastergateway/biomass/alaska_forest_nonforest.php)) in SNAP holdings that has been reprojected to EPSG:3338, found at `/workspace/Shared/Tech_Projects/beetles/project_data/ak_forest_mask.tif`.

##### Temporal extent

The risk values will be computed for 30-year long eras of the 21st century:  
* 2010-2039
* 2040-2069
* 2070-2099

### Base data

The base / input data used for computing the climate-driven risk of beetle outbreaks is the "[21st Century Hydrologic Projections for Alaska and Hawaii](https://www.earthsystemgrid.org/dataset/ucar.ral.hydro.predictions.html)" dataset produced by NCAR, specifically the "Alaska Near Surface Meteorology Daily Averages" child dataset. This dataset is available on SNAP infra at `/Data/Base_Data/Climate/AK_NCAR_12km/met`.

## Pipeline steps

0. Setup - Set up path variables, slurm variables, directories, intial conditions, etc.
1. Process yearly risk and risk components
2. Process the final risk class dataset

## 0 - Setup

Sets up path variables, slurm variables, directories, intial conditions, imports, etc. Execute this cell before any other step:

In [1]:
from config import *

## 1 - Process yearly risk and risk components

This section creates the yearly risk dataset - a collection of risk values for each year across the grid. This dataset is not expected to be the final product, but it could be a useful intermediate product.

The yearly risk values are calculated from three yearly "risk components". Saving these components as a dataset may have some merit on its own, at least for validation if nothing else. This step utilizes slurm to handle execution of the `compute_yearly_risk.py` script on all model/scenario combinations. 

We will process all future years for the expected final summary time periods: 2008-2099. We will process all years available for the Daymet dataset as well: 1980-2017.

In [7]:
kwargs = {
    "slurm_email": slurm_email,
    "partition": partition,
    "conda_init_script": conda_init_script,
    "ap_env": ap_env,
    "risk_script": risk_script,
    "met_dir": met_dir,
    # template filename for NCAR met data
    "tmp_fn": "{}_{}_BCSD_met_{}.nc4",
}

sbatch_fps = []
for model in luts.models:
    for scenario in luts.scenarios:
        sbatch_fp, sbatch_out_fp = slurm.get_yearly_fps(slurm_dir, model, luts.full_future_era, scenario)
        risk_comp_fp = risk_comp_dir.joinpath(f"risk_components_{model}_{scenario}_{luts.full_future_era}.nc")
        yearly_risk_fp = yearly_risk_dir.joinpath(f"yearly_risk_{model}_{scenario}_{luts.full_future_era}.nc")
        
        kwargs.update({
            "sbatch_fp": sbatch_fp,
            "sbatch_out_fp": sbatch_out_fp,
            "risk_comp_fp": risk_comp_fp,
            "yearly_risk_fp": yearly_risk_fp,
            "era": luts.full_future_era,
            "model": model,
            "scenario": scenario,
        })

        slurm.write_sbatch_yearly_risk(**kwargs)
        sbatch_fps.append(sbatch_fp)

We also have the daymet dataset that needs to be processed using different years from all of the projected data. Create an sbatch job for that, too:

In [8]:
model = "daymet"
era = "1980-2017"
sbatch_fp, sbatch_out_fp = slurm.get_yearly_fps(slurm_dir, model, era)
risk_comp_fp = risk_comp_dir.joinpath(f"risk_components_{model}_{era}.nc")
yearly_risk_fp = yearly_risk_dir.joinpath(f"yearly_risk_{model}_{era}.nc")

kwargs.update({
    "sbatch_fp": sbatch_fp,
    "sbatch_out_fp": sbatch_out_fp,
    "met_dir": met_dir,
    "tmp_fn": "daymet_met_{}.nc",
    "risk_comp_fp": risk_comp_fp,
    "yearly_risk_fp": yearly_risk_fp,
    "era": era,
    "model": model,
    "scenario": None,
})

slurm.write_sbatch_yearly_risk(**kwargs)
sbatch_fps.append(sbatch_fp)

Remove existing slurm output files if desired:

In [10]:
# remove existing output files if desired
_ = [fp.unlink() for fp in slurm_dir.glob("*.out")]

Submit the sbatch jobs:

In [11]:
job_ids = [slurm.submit_sbatch(fp) for fp in sbatch_fps]

## 2 - Process the final risk class dataset

Process the yearly risk data into risk classes for the three future eras. Since this doesn't take very long, we can process in the notebook instead of slurming it. This will involve two steps:

1. Preparing files for masking to forested area of Alaska
2. Classifying risk and saving masked dataset

### 2.1 - Prepare files for masking risk class dataset

We want the final risk class dataset to be masked to the forested areas of Alaska, so there is some prep work that needs to happen first:

1. Georeference the NCAR grid and save it for a template for regridding the forest mask to
2. Re-grid the forest mask (~250m resolution) to match the NCAR template

Completing those steps will give a forest mask that is on the same grid as the NCAR data which can be easily used for masking the final risk class dataset.

#### 2.1.1 Georeference NCAR grid

The NCAR data files have only the latitude and longitude geogrids defining the centerpoints of each pixel in the grid - no other spatial information. This is therefore also the case for our new risk components and yearly risk datasets. 

To mask our grid to the forested area of Alaska, we want a forest mask raster that is on the same grid as our new datasets. So we want to create a GeoTIFF file for the NCAR grid as a template.

Read in one of the time slices of an NCAR file to get the grid (i.e. just the 2-D array of data), derive the projection info using some info provided by NCAR about this dataset, and create a GeoTIFF template file:

In [75]:
# open an NCAR file to get some info from
with xr.open_dataset(met_dir.joinpath("daymet/daymet_met_1980.nc")) as ds:
    # need grid shape below
    ny, nx = ds.longitude.shape
    ncar_arr = np.flipud(ds["tmin"].values[0])

# values provided by NCAR (via email correspondence)
wrf_proj_str = PolarStereographic(**{"TRUELAT1": 64, "STAND_LON": -150}).proj4()
wrf_proj = Proj(wrf_proj_str)
wgs_proj = Proj(proj='latlong', datum='WGS84')
transformer = Transformer.from_proj(wgs_proj, wrf_proj)
e, n = transformer.transform(-150, 64)
# Grid parameters
dx, dy = 12000, 12000
# Down left corner of the domain
x0 = -(nx-1) / 2. * dx + e
y0 = -(ny-1) / 2. * dy + n
# 2d grid
x = np.arange(nx) * dx + x0
y = np.arange(ny) * dy + y0

# these coordinates will be used here and for spatially
#  referencing all resulting risk class data files
ncar_coords = {
    "y": (["y"], np.flip(y)),
    "x": (["x"], x),
}

da = xr.DataArray(
    data=ncar_arr,
    dims=["y", "x"],
    coords=ncar_coords,
)
da.attrs["_FillValue"] = np.nan

temp_ncar_fp = scratch_dir.joinpath("ncar_template_3338.tif")
da.rio.write_crs(wrf_proj_str).rio.reproject("EPSG:3338").rio.to_raster(temp_ncar_fp)

#### 2.1.2 - Regrid the forest mask

Now regrid the forest mask to match the new NCAR template. 

Since the NCAR data has a larger extent than the forest mask, we will clip (crop) the template file to the extent of the forest mask before regridding the mask.

Create a shapefile to clip to:

In [28]:
cut_fp = scratch_dir.joinpath("clip_ncar.shp")
cut_fp.unlink(missing_ok=True)
_ = subprocess.call(["gdaltindex", cut_fp, forest_fp])

Creating new index file...


Then clip the template:

In [29]:
temp_ncar_clip_fp = scratch_dir.joinpath("ncar_template_clipped_3338.tif")
temp_ncar_clip_fp.unlink(missing_ok=True)
_ = subprocess.call(
    [
        "gdalwarp",
        "-cutline",
        cut_fp,
        "-crop_to_cutline",
        "-q",
        "-overwrite",
        temp_ncar_fp,
        temp_ncar_clip_fp,
    ]
)

Then get the new metadata from the clipped NCAR file:

In [30]:
with rio.open(temp_ncar_clip_fp) as src:
    temp_meta = src.meta

Update the data type and nodata value to match that of existing mask: 

In [31]:
temp_meta.update({"dtype": "uint8", "nodata": 0})

Write a blank array with this metadata to a new GeoTIFF that will serve as a target grid for the original forest mask:

In [32]:
temp_arr = np.zeros((1, temp_meta["height"], temp_meta["width"]), dtype="uint8")

ncar_forest_fp = scratch_dir.joinpath("ak_forest_mask_ncar_3338.tif")
ncar_forest_fp.unlink(missing_ok=True)
with rio.open(ncar_forest_fp, "w", **temp_meta) as src:
    src.write(temp_arr)

Now regrid the original forest mask by calling `gadalwarp` on it with this new target GeoTIFF as the output file. The blank data (all 0's) of the target file will be updated to match the original forest mask file, effectively regridding that data:

In [33]:
_ = subprocess.call(["gdalwarp", "-q", forest_fp, ncar_forest_fp])

### 2.2 - Classify risk and mask

Using our new forest mask and template NCAR file for clipping, we will classify risk, clip, and mask all output GeoTIFFs.

In [83]:
importlib.reload(utils)

<module 'utils' from '/workspace/UA/kmredilla/spruce-beetle-risk/utils.py'>

Remember to wait until the above slurm jobs have finished so we have all of the yearly risk data files to work with. You can check if any are still running or queued with this function:

In [36]:
slurm.jobs_running(job_ids)

False

when the above function returns `False`, or when you have otherwise verified that the yearly risk dataset is completed, then iterate over the models / scenarios / snow levels / future eras, and classify, clip, and mask: 

In [84]:
args = product(luts.models, luts.scenarios, ["low", "med"], luts.eras)

kwargs = {
    "ncar_coords": ncar_coords,
    "wrf_proj_str": wrf_proj_str,
    "cut_fp": cut_fp,
    "ncar_forest_fp": ncar_forest_fp,
}

for model, scenario, snow, era in args:
    yearly_risk_fp = yearly_risk_dir.joinpath(f"yearly_risk_{model}_{scenario}_{luts.full_future_era}.nc")
    for snow in ["low", "med"]:
        risk_class_fp = risk_class_dir.joinpath(f"risk_class_{era}_{model}_{scenario}_{snow}.tif")
        # makes things a little more striaghtfroward
        kwargs.update({
            "yearly_risk_fp": yearly_risk_fp,
            "era": era,
            "snow": snow,
            "risk_class_fp": risk_class_fp,
        })
        utils.run_classify_clip_mask(**kwargs)

Also do the same for the historical era of the Daymet-based yearly risk:

In [85]:
daymet_era = "1982-2017"
yearly_risk_fp = yearly_risk_dir.joinpath(f"yearly_risk_daymet_1980-2017.nc")
for snow in ["low", "med"]:
    risk_class_fp = risk_class_dir.joinpath(f"risk_class_{daymet_era}_daymet_hist_{snow}.tif")
    kwargs.update({
        "yearly_risk_fp": yearly_risk_fp,
        "era": era,
        "snow": snow,
        "risk_class_fp": risk_class_fp,
    })
    utils.run_classify_clip_mask(**kwargs)

And copy these files to `$OUTPUT_DIR` for safe-keeping:

In [88]:
copy_args = [(fp, out_risk_dir.joinpath(fp.name)) for fp in risk_class_dir.glob("*.tif")]
_ = [shutil.copy(*arg) for arg in copy_args]

## Pipeline end!

That's it! Beetle risk secured:

In [89]:
print(subprocess.check_output(["ls", "-l", out_risk_dir]).decode("utf-8"))

total 2600
-rw-r--r--. 1 kmredilla snap_users 45069 Sep 16 18:14 risk_class_1982-2017_daymet_hist_low.tif
-rw-r--r--. 1 kmredilla snap_users 45069 Sep 16 18:14 risk_class_1982-2017_daymet_hist_med.tif
-rw-r--r--. 1 kmredilla snap_users 45069 Sep 16 18:14 risk_class_2010-2039_CCSM4_rcp45_low.tif
-rw-r--r--. 1 kmredilla snap_users 45069 Sep 16 18:14 risk_class_2010-2039_CCSM4_rcp45_med.tif
-rw-r--r--. 1 kmredilla snap_users 45069 Sep 16 18:14 risk_class_2010-2039_CCSM4_rcp85_low.tif
-rw-r--r--. 1 kmredilla snap_users 45069 Sep 16 18:14 risk_class_2010-2039_CCSM4_rcp85_med.tif
-rw-r--r--. 1 kmredilla snap_users 45069 Sep 16 18:14 risk_class_2010-2039_GFDL-ESM2M_rcp45_low.tif
-rw-r--r--. 1 kmredilla snap_users 45069 Sep 16 18:14 risk_class_2010-2039_GFDL-ESM2M_rcp45_med.tif
-rw-r--r--. 1 kmredilla snap_users 45069 Sep 16 18:14 risk_class_2010-2039_GFDL-ESM2M_rcp85_low.tif
-rw-r--r--. 1 kmredilla snap_users 45069 Sep 16 18:14 risk_class_2010-2039_GFDL-ESM2M_rcp85_med.tif
-rw-r--r--. 1 kmred