# Prepping Degree Days dataset for hosting

This notebook captures information from the degree days dataset for generating a standard metadata / GeoNetwork entry, and creates helpful `.zip`s. This dataset consists of the freezing index, thawing index, degree days below zero, and heating degree days. 

In [72]:
# import calendar

import os
import subprocess
from pathlib import Path
# from subprocess import call
# import matplotlib.pyplot as plt
# import numpy as np
# import xarray as xr
# import rasterio as rio
from itertools import product
# from tqdm.contrib.concurrent import process_map
import rasterio as rio
from rasterio.warp import transform_bounds
import numpy as np

archive_path = Path(os.getenv("ARCHIVE_DIR") or "/workspace/Shared/Tech_Projects/Arctic_EDS/project_data/rasdaman_datasets/")

### Spatial info: extent in WGS84, resolution, dimensions

Get the spatial extent of this dataset in WGS84 (and use as an opportunity to ensure they are all the same).

Use a [function](https://github.com/ua-snap/snap-geo/blob/e65e2d9aee0a1a0ea8c3432b3d01807476316206/antimeridian_raster_bbox.ipynb) to get WGS84 extent when the west side crosses the dateline. Adapt it to pull the resolution and spatial dimension sizes as well. 

In [57]:
def get_wgs84_extent(gtiff_fp):
    with rio.open(gtiff_fp) as src:
        src_crs = src.crs
        src_bounds = src.bounds
        x_res = src.transform[0]
        y_res = -src.transform[4]
        width, height = src.width, src.height
    dst_crs = CRS.from_wkt(
        CRS.from_epsg(4326).to_wkt().replace('PRIMEM["Greenwich",0', 'PRIMEM["Greenwich",180')
    )
    bounds = transform_bounds(src_crs, dst_crs, *src_bounds)
    new_bounds = np.round((bounds[0] - 180, bounds[1], bounds[2] - 180, bounds[3]), 4)
    
    return new_bounds, x_res, y_res, width, height

In [58]:
%%time
dd_types = ["thawing_index", "heating_degree_days", "freezing_index", "degree_days_below_zero"]
all_bounds = []
x_sizes, y_sizes = [], []
widths, heights = [], []
for dd_name in dd_types:
    fps = list(archive_path.joinpath(dd_name).glob("*.tif"))
    out = [get_wgs84_extent(fp) for fp in fps]
    all_bounds.extend([o[0] for o in out])
    x_sizes.extend([o[1] for o in out])
    y_sizes.extend([o[2] for o in out])
    widths.extend([o[3] for o in out])
    heights.extend([o[4] for o in out])

CPU times: user 8.13 s, sys: 671 ms, total: 8.8 s
Wall time: 9.64 s


If the below cell executes without error, then all files have the same extent:

In [59]:
assert np.all([all_bounds[0] == bnds for bnds in all_bounds])

View the bounds for inclusion in metadata:

In [60]:
print("WSEN bounds:", all_bounds[0])

WSEN bounds: [-202.4147   49.1771 -117.4109   71.3851]


Likewise, confirm that all files have the same X and Y sizes, and print those sizes:

In [61]:
assert np.all([x_sizes[0] == res for res in x_sizes])
assert np.all([y_sizes[0] == res for res in y_sizes])
print("X resolution:", x_sizes[0])
print("Y resolution:", y_sizes[0])

X resolution: 18485.55530202021
Y resolution: 18493.020445282968


Confirm that all files have the same shape, and print those:

In [63]:
assert np.all([widths[0] == w for w in widths])
assert np.all([heights[0] == h for h in heights])
print("X dimension size:", widths[0])
print("Y dimension size:", heights[0])

X dimension size: 198
Y dimension size: 106


### Temporal extent

Confirm that there is a file for each and every year from 1980-2100, as expected based on [processing notebook](./dd_preprocessing.ipynb), for all degree day types:

In [34]:
gcms = ["GFDL-CM3", "NCAR-CCSM4"]
gcm_years = np.arange(2006, 2101)
era_years = np.arange(1980, 2009)
files_exist = []
for dd_name in dd_types:
    for gcm in gcms:
        files_exist.extend(
            [
                Path(archive_path.joinpath(dd_name, f"{dd_name}_{gcm}_{year}.tif")).exists()
                for year in gcm_years    
            ]
        )
        files_exist.extend(
            [
                Path(archive_path.joinpath(dd_name, f"{dd_name}_ERA-Interim_{year}.tif")).exists()
                for year in era_years    
            ]
        )

If this cell executes without error, then all years and models are accounted for:

In [37]:
assert np.all(files_exist)

## Zip files for distribution

Here we will zip the files for distribution. We will zip them by degree day variable.

In [79]:
zip_dir = archive_path.parent.joinpath("dd_zips")
zip_dir.mkdir(exist_ok=True)

for dd_name in dd_types:
    command = f"bash ./zipit.sh {archive_path} {dd_name} {zip_dir}"
    output = subprocess.check_output(command, shell=True)

Did we zip them all?

In [80]:
# did we zip 'em all?
zips = list(zip_dir.glob("*.zip"))
zips

[PosixPath('/workspace/Shared/Tech_Projects/Arctic_EDS/project_data/dd_zips/thawing_index.zip'),
 PosixPath('/workspace/Shared/Tech_Projects/Arctic_EDS/project_data/dd_zips/heating_degree_days.zip'),
 PosixPath('/workspace/Shared/Tech_Projects/Arctic_EDS/project_data/dd_zips/freezing_index.zip'),
 PosixPath('/workspace/Shared/Tech_Projects/Arctic_EDS/project_data/dd_zips/degree_days_below_zero.zip')]

Looks like it. Make a directory in Poseidon and link these files (/workspace/CKAN not available on compute node):

```
cp /workspace/Shared/Tech_Projects/Arctic_EDS/project_data/dd_zips/*.zip /workspace/CKAN/CKAN_Data/Base/AK_WRF/Arctic_EDS_degree_days
```