# Reprojecting to EPSG 3338

The purpose of this notebook is reproject output GeoTIFFs to EPSG:3338 - and the reason this is split out into a separate notebook is because different resampling techniques could have a significant impact on how variables like air temperature or snow melt for example, would represent variation in elevation where intra-grid and inter-grid cell elevation variance is high. Reprojection choices could also impact data availability for coastal regions.

There are two axes of variation to consider here: resampling, and grid alignment.

## Resampling
Reprojection always involves resampling because the grid is changing. Typically we use the nearest neighbor method - this is computationally cheap and nearest neighbor is the typical default for most libraries and programs that we use. For an example, in nearest neighbor resampling the temperature value assigned to each output cell will be the same as the temperature value at the center of the corresponding (nearest) input cell. Nearest neighbor is a conservative choice because there will be no values in the resampled dataset that do not exist in the source dataset - which is important if you are dealing with measurementsor wish to retain total fidelity to the values in the source dataset. But most of our data are not measurements, so are we tied to nearest neighbor?

Bilinear or cubic resampling would compute some kind of weighted value based on the distance between each input cell and the center of the output cell. This means that the output raster might better capture the temperature variation associated with changes in elevation within and around each input grid cell. However that kind of resampling can introduce smoothing and interpolation artifacts.

## Grid Alignment
We can align output data to a "standard grid" - meaning that the extents will be some integer multiple of the specified output resolution.

## Implementations
There are several ways we implement these different methods as well: GDAL, either from the command line, in a shell script, or spawned from Python. We can also use rasterio and dask!

Let's take a look at the `gdalinfo` output for the data that we have created so far.

In [1]:
from config import OUTPUT_DIR, aux_dir, reprojected_dir
import subprocess
import os
import rasterio as rio
import numpy as np
import pandas as pd
import dask
import dask.distributed as dd
from rasterio.windows import Window
from pyproj import Transformer
from rasterio.warp import calculate_default_transform, reproject, Affine, Resampling, aligned_target
from rasterio.transform import array_bounds
from rasterio.crs import CRS
from pathlib import Path

In [2]:
# just grab a single test raster
input_raster = OUTPUT_DIR / "runoff_mm_CSIRO-Mk3-6-0_rcp85_aug_total_2050-2060_mean.tif"

In [3]:
# call gdalinfo and capture the output
gdalinfo_output = subprocess.check_output(['gdalinfo', input_raster])
print(gdalinfo_output.decode())

Driver: GTiff/GeoTIFF
Files: /atlas_scratch/cparr4/AK_NCAR_12km_decadal_means_of_monthly_summaries/runoff_mm_CSIRO-Mk3-6-0_rcp85_aug_total_2050-2060_mean.tif
Size is 299, 209
Coordinate System is:
PROJCRS["unknown",
    BASEGEOGCRS["unknown",
        DATUM["unknown",
            ELLIPSOID["unknown",6370000,0,
                LENGTHUNIT["metre",1,
                    ID["EPSG",9001]]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433,
                ID["EPSG",9122]]]],
    CONVERSION["Polar Stereographic (variant B)",
        METHOD["Polar Stereographic (variant B)",
            ID["EPSG",9829]],
        PARAMETER["Latitude of standard parallel",64,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8832]],
        PARAMETER["Longitude of origin",-150,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8833]],
        PARAMETER["False easting",0,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]]

The most basic reprojection would look like this:

```sh
gdalwarp -t_srs EPSG:3338 -multi -wo NUM_THREADS=32 input_raster output_raster
```

We shouldn't need to specify nodata values here - those will be read and assigned automatically from the source dataset. 
Let's spawn this command as a subprocess from the notebook here, just like we did with `gdalinfo`.

In [4]:
output_raster = aux_dir / f"epsg3338_no_opts_{input_raster.name}"

command = f"gdalwarp -t_srs EPSG:3338 -multi -wo NUM_THREADS=32 -overwrite {input_raster} {output_raster}"
process = subprocess.Popen(command, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)

stdout, stderr = process.communicate()
print(stdout.decode())
print(stderr.decode())
gdalinfo_output = subprocess.check_output(['gdalinfo', output_raster])
print(gdalinfo_output.decode())

Creating output file that is 318P x 224L.
Processing /atlas_scratch/cparr4/AK_NCAR_12km_decadal_means_of_monthly_summaries/runoff_mm_CSIRO-Mk3-6-0_rcp85_aug_total_2050-2060_mean.tif [1/1] : 0Using internal nodata values (e.g. -9999) for image /atlas_scratch/cparr4/AK_NCAR_12km_decadal_means_of_monthly_summaries/runoff_mm_CSIRO-Mk3-6-0_rcp85_aug_total_2050-2060_mean.tif.
Copying nodata values from source /atlas_scratch/cparr4/AK_NCAR_12km_decadal_means_of_monthly_summaries/runoff_mm_CSIRO-Mk3-6-0_rcp85_aug_total_2050-2060_mean.tif to destination /atlas_scratch/cparr4/AK_NCAR_12km_decadal_means_of_monthly_summaries/auxiliary_content/epsg3338_no_opts_runoff_mm_CSIRO-Mk3-6-0_rcp85_aug_total_2050-2060_mean.tif.
...10...20...30...40...50...60...70...80...90...100 - done.


Driver: GTiff/GeoTIFF
Files: /atlas_scratch/cparr4/AK_NCAR_12km_decadal_means_of_monthly_summaries/auxiliary_content/epsg3338_no_opts_runoff_mm_CSIRO-Mk3-6-0_rcp85_aug_total_2050-2060_mean.tif
Size is 318, 224
Coordinate S

OK so the thing to notice here is that the corner coordinates and origin and pixel sizes all are messy floats. And we've gone from an array that 299 X 209 to an array that is 318 X 224. What happens if we now specify that the target resolution should be exactly 12000 m X 12000 m?

In [5]:
output_raster = aux_dir / f"epsg3338_12km_{input_raster.name}"

command = f"gdalwarp -t_srs EPSG:3338 -tr 12000 12000 -multi -wo NUM_THREADS=32 -overwrite {input_raster} {output_raster}"
process = subprocess.Popen(command, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)

stdout, stderr = process.communicate()
print(stdout.decode())
print(stderr.decode())
gdalinfo_output = subprocess.check_output(['gdalinfo', output_raster])
print(gdalinfo_output.decode())

Creating output file that is 316P x 223L.
Processing /atlas_scratch/cparr4/AK_NCAR_12km_decadal_means_of_monthly_summaries/runoff_mm_CSIRO-Mk3-6-0_rcp85_aug_total_2050-2060_mean.tif [1/1] : 0Using internal nodata values (e.g. -9999) for image /atlas_scratch/cparr4/AK_NCAR_12km_decadal_means_of_monthly_summaries/runoff_mm_CSIRO-Mk3-6-0_rcp85_aug_total_2050-2060_mean.tif.
Copying nodata values from source /atlas_scratch/cparr4/AK_NCAR_12km_decadal_means_of_monthly_summaries/runoff_mm_CSIRO-Mk3-6-0_rcp85_aug_total_2050-2060_mean.tif to destination /atlas_scratch/cparr4/AK_NCAR_12km_decadal_means_of_monthly_summaries/auxiliary_content/epsg3338_12km_runoff_mm_CSIRO-Mk3-6-0_rcp85_aug_total_2050-2060_mean.tif.
...10...20...30...40...50...60...70...80...90...100 - done.


Driver: GTiff/GeoTIFF
Files: /atlas_scratch/cparr4/AK_NCAR_12km_decadal_means_of_monthly_summaries/auxiliary_content/epsg3338_12km_runoff_mm_CSIRO-Mk3-6-0_rcp85_aug_total_2050-2060_mean.tif
Size is 316, 223
Coordinate System 

As expected, the pixel size is now truly 12 km in each dimension, but now the array size is 316 X 223! Next let's experiment with aligning the 12 km pixels to the "standard" grid, meaning that the coordinates should all have integer values that are multiples of the pixel size. This is done with the GDAL `-tap` option (which must be used in conjuction with `-tr`).

In [6]:
%%time
output_raster = aux_dir / f"epsg3338_12km_tap_nearest_{input_raster.name}"

command = f"gdalwarp -t_srs EPSG:3338 -tap -tr 12000 12000 -multi -wo NUM_THREADS=32 -overwrite {input_raster} {output_raster}"
process = subprocess.Popen(command, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)

stdout, stderr = process.communicate()
print(stdout.decode())
print(stderr.decode())
gdalinfo_output = subprocess.check_output(['gdalinfo', output_raster])
print(gdalinfo_output.decode())

Creating output file that is 317P x 224L.
Processing /atlas_scratch/cparr4/AK_NCAR_12km_decadal_means_of_monthly_summaries/runoff_mm_CSIRO-Mk3-6-0_rcp85_aug_total_2050-2060_mean.tif [1/1] : 0Using internal nodata values (e.g. -9999) for image /atlas_scratch/cparr4/AK_NCAR_12km_decadal_means_of_monthly_summaries/runoff_mm_CSIRO-Mk3-6-0_rcp85_aug_total_2050-2060_mean.tif.
Copying nodata values from source /atlas_scratch/cparr4/AK_NCAR_12km_decadal_means_of_monthly_summaries/runoff_mm_CSIRO-Mk3-6-0_rcp85_aug_total_2050-2060_mean.tif to destination /atlas_scratch/cparr4/AK_NCAR_12km_decadal_means_of_monthly_summaries/auxiliary_content/epsg3338_12km_tap_nearest_runoff_mm_CSIRO-Mk3-6-0_rcp85_aug_total_2050-2060_mean.tif.
...10...20...30...40...50...60...70...80...90...100 - done.


Driver: GTiff/GeoTIFF
Files: /atlas_scratch/cparr4/AK_NCAR_12km_decadal_means_of_monthly_summaries/auxiliary_content/epsg3338_12km_tap_nearest_runoff_mm_CSIRO-Mk3-6-0_rcp85_aug_total_2050-2060_mean.tif
Size is 317

Now that is a rather clean `gdalinfo` output - and the array size is 317, 224. It does seem that these options `-multi -wo NUM_THREADS=32` speed up the processing slightly. This is the exact equilvalent of executing `gdalwarp` from the command line like this:

```sh
gdalwarp -t_srs EPSG:3338 -tap -tr 12000 12000 -multi -wo NUM_THREADS=32 input_raster output_raster
```
or the one-line incantation to do this for all GeoTIFFs in a folder:

```sh
find source_dir -name "*.tif" -exec gdalwarp -t_srs EPSG:3338 -tap -tr 12000 12000 -multi -wo NUM_THREADS=32 {} output_dir/{} \;
```

The above implemenation won't tweak the filenames, but will just write the data using the input filename to a different directory.


#### Resampling

Now, all of these implementations use the same resampling method: nearest neighbor. GDAL provides quite a few alternatives - most relevant for our work are likely the following options:

`bilinear`: bilinear resampling.

`cubic`: cubic resampling.

`cubicspline`: cubic spline resampling.

Let's run a few of these with our most recent version of the gdalwarp command:

In [7]:
resampling_methods = ["bilinear", "cubic", "cubicspline"]

for method in resampling_methods:
    
    output_raster = aux_dir / f"epsg3338_12km_tap_{method}_{input_raster.name}"
    
    command = f"gdalwarp -t_srs EPSG:3338 -tap -tr 12000 12000 -multi -wo NUM_THREADS=32 -overwrite -r {method} {input_raster} {output_raster}"
    process = subprocess.Popen(command, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)

    stdout, stderr = process.communicate()
    print(stdout.decode())
    print(stderr.decode())

Creating output file that is 317P x 224L.
Processing /atlas_scratch/cparr4/AK_NCAR_12km_decadal_means_of_monthly_summaries/runoff_mm_CSIRO-Mk3-6-0_rcp85_aug_total_2050-2060_mean.tif [1/1] : 0Using internal nodata values (e.g. -9999) for image /atlas_scratch/cparr4/AK_NCAR_12km_decadal_means_of_monthly_summaries/runoff_mm_CSIRO-Mk3-6-0_rcp85_aug_total_2050-2060_mean.tif.
Copying nodata values from source /atlas_scratch/cparr4/AK_NCAR_12km_decadal_means_of_monthly_summaries/runoff_mm_CSIRO-Mk3-6-0_rcp85_aug_total_2050-2060_mean.tif to destination /atlas_scratch/cparr4/AK_NCAR_12km_decadal_means_of_monthly_summaries/auxiliary_content/epsg3338_12km_tap_bilinear_runoff_mm_CSIRO-Mk3-6-0_rcp85_aug_total_2050-2060_mean.tif.
...10...20...30...40...50...60...70...80...90...100 - done.


Creating output file that is 317P x 224L.
Processing /atlas_scratch/cparr4/AK_NCAR_12km_decadal_means_of_monthly_summaries/runoff_mm_CSIRO-Mk3-6-0_rcp85_aug_total_2050-2060_mean.tif [1/1] : 0Using internal nodata

In [8]:
df = pd.read_csv("https://raw.githubusercontent.com/ua-snap/geospatial-vector-veracity/main/vector_data/point/alaska_point_locations.csv").drop(["region", "country"], axis=1)

In [9]:
df.set_index("id", inplace=True)

In [10]:
# function to extract values from a geotiff
def extract_points(fp, lat_lons):
    with rio.open(fp) as src:
        # get CRS
        tiff_crs = src.crs

        # reproject point lat lon
        transformer = Transformer.from_crs("epsg:4326", tiff_crs)
        
        extracted_values = []
        for loc in lat_lons:
            lat, lon = loc
            x, y = transformer.transform(lat, lon)

            # get row / column
            r, c = src.index(x, y)

            # create a window object by specifying the starting place 
            #  of the array and the size of the window
            # note order is column offset, row offset, width, height
            window = Window(c, r, 1, 1)

            # read the data
            arr = src.read(1, window=window)
            extracted_values.append(arr[0])
    return extracted_values, fp.name

In [11]:
paths = list(aux_dir.glob("*.tif"))

for tiff in paths:
    # apply extract_points to each row of the dataframe
    df[tiff.name.split("runoff")[0]] = df.apply(lambda row: extract_points(tiff, [(row['latitude'], row['longitude'])])[0][0], axis=1)

In [12]:
df_filtered = df[df.apply(lambda x: np.any(np.array(x) == [-9999.0]), axis=1)]
df_filtered.tail(10)

  df_filtered = df[df.apply(lambda x: np.any(np.array(x) == [-9999.0]), axis=1)]


Unnamed: 0_level_0,name,alt_name,latitude,longitude,km_distance_to_ocean,dask_rio_,epsg3338_no_opts_,epsg3338_12km_,epsg3338_12km_tap_nearest_,epsg3338_12km_tap_bilinear_,epsg3338_12km_tap_cubic_,epsg3338_12km_tap_cubicspline_
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
AK486,Thoms Place State Marine Park,,56.1732,-132.1328,0.0,[-9999.0],[-9999.0],[-9999.0],[-9999.0],[-9999.0],[-9999.0],[-9999.0]
AK396,Thorne Bay,,55.7274,-132.471,0.6,[-9999.0],[-9999.0],[-9999.0],[-9999.0],[-9999.0],[-9999.0],[-9999.0]
AK400,Tokeen,,55.938,-133.324,0.3,[-9999.0],[-9999.0],[-9999.0],[-9999.0],[-9999.0],[-9999.0],[-9999.0]
AK408,Tyonek,Qaggeyshlat,61.0681,-151.137,0.0,[11.0],[9.0],[-9999.0],[11.0],[10.521813],[10.521813],[10.53451]
AK413,Umnak,,53.267,-168.217,1.3,[-9999.0],[-9999.0],[-9999.0],[-9999.0],[-9999.0],[-9999.0],[-9999.0]
AK415,Unalaska,Iluulux̂,53.8733,-166.533,0.4,[-9999.0],[-9999.0],[-9999.0],[-9999.0],[-9999.0],[-9999.0],[-9999.0]
AK425,Ward Cove,,55.408,-131.724,0.0,[-9999.0],[-9999.0],[-9999.0],[-9999.0],[-9999.0],[-9999.0],[-9999.0]
AK427,Whale Pass,,56.1,-133.167,1.7,[-9999.0],[-9999.0],[-9999.0],[-9999.0],[-9999.0],[-9999.0],[-9999.0]
AK433,Woody Island,,57.78,-152.355,0.4,[24.0],[24.0],[-9999.0],[24.0],[24.20971],[24.20971],[24.52939]
AK434,Wrangell,Shtax’héen,56.4708,-132.377,0.6,[-9999.0],[-9999.0],[-9999.0],[-9999.0],[-9999.0],[-9999.0],[-9999.0]


The results of the extractions show that some community locations don't have data. For some of these cases, the lack of data is independent of the reprojection or resampling technique! Places like Ketchikan are probably 100 miles at least from the edge of the dataset, so that is expected. But some do/don't have data coverage though depending on how the reprojection/resampling is executed! Look at Tyonek:

Tyonek
 - Vanilla reprojection, no options provided to gdalwarp: 9.0 mm
 - Forced to 12 km (-tr 12000 12000): -9999.0 (no data)
 - Forced to 12 km and aligned (-tap -tr 12000 12000): 11.0 mm

Unless we hear strong arguments otherwise, we'll proceed to reproject each GeoTIFF into 12 km resolution, EPSG 3338, using the `tap` option if possible. I'm selecting this choice because it seems to retain data for coastal locations where possible, and retains fidelity to the source dataset. An added bonus is that grid boundaries are clean integers. 

Now, it would be interesting to know if we can replicate that specific execution of gdalwarp with rasterio plus dask. To replicate what GDAL does we need an output raster with these parameters:

```
Size is 317, 224
Coordinate System is: PROJCRS["NAD83 / Alaska Albers",
Origin = (-1776000.000000000000000,2880000.000000000000000)
Pixel Size = (12000.000000000000000,-12000.000000000000000)
Corner Coordinates:
Upper Left  (-1776000.000, 2880000.000) (156d29'47.20"E, 69d38'34.37"N)
Lower Left  (-1776000.000,  192000.000) (178d24'40.55"W, 48d44'42.21"N)
Upper Right ( 2028000.000, 2880000.000) (100d 5'38.99"W, 68d 2'14.50"N)
Lower Right ( 2028000.000,  192000.000) (126d29'15.98"W, 47d52'12.31"N)
Center      (  126000.000, 1536000.000) (151d26'26.76"W, 63d46' 4.80"N)
```

In [13]:
dst_crs = CRS.from_epsg(3338)
tr = 12000
# we know we want these output dimensions based on the results from `gdalwarp -tap -tr 12000 12000`
t_width = 317
t_height = 224

In [14]:
help(aligned_target)

Help on function aligned_target in module rasterio.warp:

aligned_target(transform, width, height, resolution)
    Aligns target to specified resolution
    
    Parameters
    ----------
    transform : Affine
        Input affine transformation matrix
    width, height: int
        Input dimensions
    resolution: tuple (x resolution, y resolution) or float
        Target resolution, in units of target coordinate reference
        system.
    
    Returns
    -------
    transform: Affine
        Output affine transformation matrix
    width, height: int
        Output dimensions



In [15]:
help(array_bounds)

Help on function array_bounds in module rasterio.transform:

array_bounds(height, width, transform)
    Return the bounds of an array given height, width, and a transform.
    
    Return the `west, south, east, north` bounds of an array given
    its height, width, and an affine transform.



In [16]:
def reproject_raster(file, target_directory, name_prefix):
    with rio.open(file) as src:
        
        # compute the new affine transformation, width and height
        warp_transform, width, height = rio.warp.calculate_default_transform(src.crs, dst_crs, src.width, src.height, *src.bounds, resolution=(tr, tr))
        tap_transform, tap_width, tap_height = aligned_target(warp_transform, t_width - 1, t_height - 1, tr) # the -1 might just be an indexing thing
        # but without the offset, the output height and width are too large (by 1) when compared to what is created by gdalwarp -tap
    
        # define the output raster profile
        out_profile = src.profile.copy()
        out_profile.update({
            "crs": dst_crs,
            "transform": tap_transform,
            "width": t_width,
            "height": t_height,
            "bounds": array_bounds(tap_height, tap_width, tap_transform)
         })

        # create the new raster file
        out_file = target_directory / f"{name_prefix}{file.name}"
        with rio.open(out_file, 'w', **out_profile) as dst:
            # reproject the input raster data
            rio.warp.reproject(
                source=src.read(1),
                destination=rio.band(dst, 1),
                src_transform=src.transform,
                src_crs=src.crs,
                dst_transform=tap_transform,
                dst_crs=dst_crs,
                resampling=Resampling.nearest # this is the default, just being explicit here for easy change or experimentation later
            )

In [17]:
reproject_raster(input_raster, aux_dir, "dask_rio_")

In [18]:
gdalinfo_output = subprocess.check_output(['gdalinfo', f"{aux_dir}/dask_rio_{input_raster.name}"])
print(gdalinfo_output.decode())

Driver: GTiff/GeoTIFF
Files: /atlas_scratch/cparr4/AK_NCAR_12km_decadal_means_of_monthly_summaries/auxiliary_content/dask_rio_runoff_mm_CSIRO-Mk3-6-0_rcp85_aug_total_2050-2060_mean.tif
Size is 317, 224
Coordinate System is:
PROJCRS["NAD83 / Alaska Albers",
    BASEGEOGCRS["NAD83",
        DATUM["North American Datum 1983",
            ELLIPSOID["GRS 1980",6378137,298.257222101,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4269]],
    CONVERSION["Alaska Albers (meters)",
        METHOD["Albers Equal Area",
            ID["EPSG",9822]],
        PARAMETER["Latitude of false origin",50,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8821]],
        PARAMETER["Longitude of false origin",-154,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8822]],
        PARAMETER["Latitude of 1st standard parallel",55,
            ANGLEUNIT["degree",0.017453

The metadata looks like a match! We should also check the actual data too.
The file we just created (`dask_rio_runoff_mm_CSIRO-Mk3-6-0_rcp85_aug_total_2050-2060_mean.tif`) should be the same as our GDAL mockup: `epsg3338_12km_tap_nearest_runoff_mm_CSIRO-Mk3-6-0_rcp85_aug_total_2050-2060_mean.tif`

In [19]:
with rio.open(aux_dir / "epsg3338_12km_tap_nearest_runoff_mm_CSIRO-Mk3-6-0_rcp85_aug_total_2050-2060_mean.tif") as ref_src:
    ref_arr = ref_src.read(1)
with rio.open(aux_dir / "dask_rio_runoff_mm_CSIRO-Mk3-6-0_rcp85_aug_total_2050-2060_mean.tif") as test_src:
    test_arr = test_src.read(1)

np.all(ref_arr == test_arr)

True

Nice. Let's now use dask to try to execute our reproject_raster function that mimics `gdalwarp -tap -tr 12000 12000...` for every single raster in our dataset. How many GeoTIFFs should we have?

Water Flux: 10 models * 2 scenarios * 15 decades * 12 months * 4 variables

Water State: 10 models * 2 scenarios * 15 decades * 12 months * 5 variables

Met: 9 models * 2 scenarios * 15 decades * 12 months * 3 variables

Recall that the met group only has 9 models at the moment because we have skipped the HadGEM2-ES because the data wasn't complete, see the EDA notebook for details.

In [20]:
n_wf_tiffs = 10 * 2 * 15 * 12 * 4
n_ws_tiffs = 10 * 2 * 15 * 12 * 5
n_met_tiffs = 9 * 2 * 15 * 12 * 3
tiff_count = n_met_tiffs + n_ws_tiffs + n_wf_tiffs
tiff_count

42120

In [21]:
paths = list(OUTPUT_DIR.glob("*.tif"))
assert len(paths) == tiff_count

All geotiffs present and ready to be reprojected!

In [22]:
# create a Dask client, forward port 8787 (default) and watch the sparks fly
client = dd.Client()

In [23]:
client

0,1
Connection method: Cluster object,Cluster type: distributed.LocalCluster
Dashboard: http://127.0.0.1:8787/status,

0,1
Dashboard: http://127.0.0.1:8787/status,Workers: 8
Total threads: 32,Total memory: 251.72 GiB
Status: running,Using processes: True

0,1
Comm: tcp://127.0.0.1:35326,Workers: 8
Dashboard: http://127.0.0.1:8787/status,Total threads: 32
Started: Just now,Total memory: 251.72 GiB

0,1
Comm: tcp://127.0.0.1:35822,Total threads: 4
Dashboard: http://127.0.0.1:42855/status,Memory: 31.46 GiB
Nanny: tcp://127.0.0.1:46231,
Local directory: /tmp/dask-worker-space/worker-10uiwze9,Local directory: /tmp/dask-worker-space/worker-10uiwze9

0,1
Comm: tcp://127.0.0.1:32842,Total threads: 4
Dashboard: http://127.0.0.1:35572/status,Memory: 31.46 GiB
Nanny: tcp://127.0.0.1:36276,
Local directory: /tmp/dask-worker-space/worker-en7d867i,Local directory: /tmp/dask-worker-space/worker-en7d867i

0,1
Comm: tcp://127.0.0.1:35107,Total threads: 4
Dashboard: http://127.0.0.1:40822/status,Memory: 31.46 GiB
Nanny: tcp://127.0.0.1:34868,
Local directory: /tmp/dask-worker-space/worker-lmrmyq9b,Local directory: /tmp/dask-worker-space/worker-lmrmyq9b

0,1
Comm: tcp://127.0.0.1:43311,Total threads: 4
Dashboard: http://127.0.0.1:35181/status,Memory: 31.46 GiB
Nanny: tcp://127.0.0.1:43150,
Local directory: /tmp/dask-worker-space/worker-nnva0shz,Local directory: /tmp/dask-worker-space/worker-nnva0shz

0,1
Comm: tcp://127.0.0.1:35599,Total threads: 4
Dashboard: http://127.0.0.1:42459/status,Memory: 31.46 GiB
Nanny: tcp://127.0.0.1:37792,
Local directory: /tmp/dask-worker-space/worker-6l7hmgay,Local directory: /tmp/dask-worker-space/worker-6l7hmgay

0,1
Comm: tcp://127.0.0.1:41178,Total threads: 4
Dashboard: http://127.0.0.1:42893/status,Memory: 31.46 GiB
Nanny: tcp://127.0.0.1:35421,
Local directory: /tmp/dask-worker-space/worker-q4_578_c,Local directory: /tmp/dask-worker-space/worker-q4_578_c

0,1
Comm: tcp://127.0.0.1:35635,Total threads: 4
Dashboard: http://127.0.0.1:41192/status,Memory: 31.46 GiB
Nanny: tcp://127.0.0.1:42288,
Local directory: /tmp/dask-worker-space/worker-l9713lho,Local directory: /tmp/dask-worker-space/worker-l9713lho

0,1
Comm: tcp://127.0.0.1:34066,Total threads: 4
Dashboard: http://127.0.0.1:36438/status,Memory: 31.46 GiB
Nanny: tcp://127.0.0.1:35202,
Local directory: /tmp/dask-worker-space/worker-tse9cy1d,Local directory: /tmp/dask-worker-space/worker-tse9cy1d


In [24]:
# use Dask to parallelize the processing of the input raster files
# because they are going to a new directory we don't need a file prefix
dask.compute(*[dask.delayed(reproject_raster)(f, reprojected_dir, "") for f in paths])

(None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,

In [25]:
# did we reproject 'em all?
reprojected_paths = list(reprojected_dir.glob("*.tif"))
assert len(reprojected_paths) == tiff_count

In [26]:
client.close()

Great, we'll now check some of these GeoTIFFs out in more detail in the QC notebook.