# Reprojecting to EPSG 3338

The purpose of this notebook is reproject output GeoTIFFs to EPSG:3338 - and the reason this is split out into a separate notebook is because different resampling techniques could have a significant impact on how variables like air temperature or snow melt for example, would represent variation in elevation where intra-grid and inter-grid cell elevation variance is high.

Typically when resampling we use the nearest neighbor method - this is computationally cheap and nearest neighbor is the typical default for most libraries and programs that we use. The temperature value assigned to each output cell will be the same as the temperature value at the center of the corresponding input cell. Nearest neighbor can be a conservative choice because there will be no values in the resampled dataset that do not exist in the source dataset - which is important if you are dealing with measurements. But most of our data are not measurements, so are we tied to nearest neighbor?

Bilinear or cubic resampling will use the temperature values of the neighboring input cells and basically compute some kind of weighted value based on the distance between each input cell and the center of the output cell. This means that the output raster might better capture the temperature variation associated with changes in elevation within and around each input grid cell. However that kind of resampling can introduce smoothing and interpolation artifacts.

Consider some experimentation with `gdalwarp`. There are some interesting differences in the variable values for locations like Juneau, where there are drastic changes in elevation over short distances.

### Juneau (12 km resolution) CCSM4 RCP 8.5: 2080-2090 Average May total snow melt (mm)
 - Nearest Neighbor: 0 mm
 - Median: 0 mm
 - Mean: 190 mm
 - Bilinear: 193 mm
 - Cubic: 229 mm
 
Another point to consider is that summarizing data by HUC-12s, rather than discrete points, is one way to buffer against the impact of the elevation assigned by the source model to a specific grid cell.

However, unless we hear otherwise from the downscaling group, or perhaps Jeremy L., the the move is to stick with nearest neighbor.

Right now we have files named things like `runoff_mm_inmcm4_rcp85_aug_total_2060-2070_mean.tif` - let's reproject these to EPSG 3338 and append 3338 on the end of the filename.

The current snapshot of the `gdalinfo` command looks like this example:

```sh
Driver: GTiff/GeoTIFF
Files: glacier_melt_mm_CSIRO-Mk3-6-0_rcp45_jun_total_2040-2050_mean.tif
Size is 299, 209
Coordinate System is:
PROJCS["unknown",
    GEOGCS["unknown",
        DATUM["unknown",
            SPHEROID["unknown",6370000,0]],
        PRIMEM["Greenwich",0],
        UNIT["degree",0.0174532925199433]],
    PROJECTION["Polar_Stereographic"],
    PARAMETER["latitude_of_origin",64],
    PARAMETER["central_meridian",-150],
    PARAMETER["scale_factor",1],
    PARAMETER["false_easting",0],
    PARAMETER["false_northing",0],
    UNIT["metre",1,
        AUTHORITY["EPSG","9001"]]]
Origin = (-1794000.000000000000000,-1538424.205046423245221)
Pixel Size = (12000.000000000000000,-12000.000000000000000)
Metadata:
  AREA_OR_POINT=Area
Image Structure Metadata:
  COMPRESSION=LZW
  INTERLEAVE=BAND
Corner Coordinates:
Upper Left  (-1794000.000,-1538424.205) (160d36'51.70"E, 67d53'18.86"N)
Lower Left  (-1794000.000,-4046424.205) (173d54'37.13"W, 49d47'59.23"N)
Upper Right ( 1794000.000,-1538424.205) (100d36'51.70"W, 67d53'18.86"N)
Lower Right ( 1794000.000,-4046424.205) (126d 5'22.87"W, 49d47'59.23"N)
Center      (       0.000,-2792424.205) (150d 0' 0.00"W, 64d 0' 0.00"N)
Band 1 Block=299x6 Type=Float32, ColorInterp=Gray
  NoData Value=-9999
```

This GDAL incantation will reproject all the GEOTIFFS in a folder:

```sh
find . -name "*.tif" -exec gdalwarp -t_srs EPSG:3338 -tap -tr 12000 12000 -multi -wo NUM_THREADS=32 {} ./wf_geotiffs_3338/{} \;
```

In [None]:
from pathlib import Path
import os

OUTPUT_DIR = Path(os.getenv("OUTPUT_DIR"))
paths = list(OUTPUT_DIR.glob("*.tif"))
assert len(paths) == 14400

In [None]:
import dask
import dask.distributed as dd
import rasterio
from rasterio.warp import calculate_default_transform, reproject, Resampling
from rasterio.transform import from_origin

# Create a Dask client

In [None]:
client = dd.Client()


In [None]:


# Define the function to reproject a single raster file
def reproject_raster(file):
    # Open the input raster
    with rasterio.open(file) as src:
        # Define the output CRS and transform
        dst_crs = 'EPSG:3338'
        transform, width, height = calculate_default_transform(src.crs, dst_crs, src.width, src.height, *src.bounds)

        # Define the output file name
        out_file = OUTPUT_DIR / "cog" / file.name

        # Define the output raster profile
        out_profile = src.profile.copy()
        out_profile.update({
            'crs': dst_crs,
            'transform': transform,
            'width': width,
            'height': height,
            'compress': 'deflate',
            'predictor': 2,
            'tiled': True,
            'blockxsize': 512,
            'blockysize': 512
        })

        # Create the output raster
        with rasterio.open(out_file, 'w', **out_profile) as dst:
            # Reproject the input raster to the output CRS
            reproject(
                source=rasterio.band(src, 1),
                destination=rasterio.band(dst, 1),
                src_transform=src.transform,
                src_crs=src.crs,
                dst_transform=transform,
                dst_crs=dst_crs,
                num_threads=1 # Set to 1 to avoid issues with Dask
            )



In [None]:

# Use Dask to parallelize the processing of the input raster files
dask.compute(*[dask.delayed(reproject_raster)(f) for f in paths])
