# Fetching NLCD Data
We have prepared shapefiles containing the USGS quarter quadrangles that have good coverage of forest stand delineations that we want to grab other data for. We'll fetch land cover classifications from the  Multiresolution Land Characteristics Consortium's web service for each tile, simplify the classification into \{forest; field; water; or developed\} and write our outputs as GeoTiffs to Google Drive. 

# Mount Google Drive 
So we can access our files showing tile locations, and save the rasters we will generate from the elevation data.

In [1]:
from google.colab import drive
drive.mount('/content/drive', force_remount=True)

Mounted at /content/drive


In [2]:
! pip install geopandas rasterio -q

[K     |████████████████████████████████| 972kB 2.8MB/s 
[K     |████████████████████████████████| 18.2MB 1.2MB/s 
[K     |████████████████████████████████| 10.9MB 202kB/s 
[K     |████████████████████████████████| 14.8MB 320kB/s 
[?25h

In [3]:
import io
import numpy as np
import geopandas as gpd
import os
import rasterio
import requests

from functools import partial
from imageio import imread
from matplotlib import pyplot as plt
from multiprocessing.pool import ThreadPool
from rasterio import transform

In [4]:
SHP_DIR = '/content/drive/Shared drives/stand_mapping/data/interim'

WA11_SHP = 'washington_utm11n_training_quads_epsg6340.shp'
WA10_SHP = 'washington_utm10n_training_quads_epsg6339.shp'
OR10_SHP = 'oregon_utm10n_training_quads_epsg6339.shp'
OR11_SHP = 'oregon_utm11n_training_quads_epsg6340.shp'

or10_gdf = gpd.read_file(os.path.join(SHP_DIR, OR10_SHP))
or11_gdf = gpd.read_file(os.path.join(SHP_DIR, OR11_SHP))
wa10_gdf = gpd.read_file(os.path.join(SHP_DIR, WA10_SHP))
wa11_gdf = gpd.read_file(os.path.join(SHP_DIR, WA11_SHP))

The following functions will do the work to retrieve the NLCD raster.

In [5]:
def nlcd_from_mrlc(bbox, res, layer, inSR=4326, nlcd=True, **kwargs):
    """
    Retrieves National Land Cover Data (NLCD) Layers from the Multiresolution
    Land Characteristics Consortium's web service.

    Parameters
    ----------
    bbox : list-like
      list of bounding box coordinates (minx, miny, maxx, maxy)
    res : numeric
      spatial resolution to use for returned raster (grid cell size)
    layer : str
      title of layer to retrieve (e.g., 'NLCD_2001_Land_Cover_L48')
    inSR : int
      spatial reference for bounding box, such as an EPSG code (e.g., 4326)
    nlcd : bool
      if True, will re-map the values returned to the NLCD land cover codes

    Returns
    -------
    img : numpy array
      map image as array
    """
    width = int(abs(bbox[2] - bbox[0]) // res)
    height = int(abs(bbox[3] - bbox[1]) // res)
    BASE_URL = ''.join([
        'https://www.mrlc.gov/geoserver/mrlc_display/wms?',
        'service=WMS&request=GetMap',
    ])

    params = dict(bbox=','.join([str(x) for x in bbox]),
                  crs=f'epsg:{inSR}',
                  width=width,
                  height=height,
                  format='image/tiff',
                  layers=layer)
    for key, value in kwargs.items():
        params.update({key: value})

    r = requests.get(BASE_URL, params=params)
    img = imread(io.BytesIO(r.content), format='tiff')

    if nlcd:
        MAPPING = {
            1: 11,  # open water
            2: 12,  # perennial ice/snow
            3: 21,  # developed, open space
            4: 22,  # developed, low intensity
            5: 23,  # developed, medium intensity
            6: 24,  # developed, high intensity
            7: 31,  # barren land (rock/stand/clay)
            8: 32,  # unconsolidated shore
            9: 41,  # deciduous forest
            10: 42,  # evergreen forest
            11: 43,  # mixed forest
            12: 51,  # dwarf scrub (AK only)
            13: 52,  # shrub/scrub
            14: 71,  # grasslands/herbaceous,
            15: 72,  # sedge/herbaceous (AK only)
            16: 73,  # lichens (AK only)
            17: 74,  # moss (AK only)
            18: 81,  # pasture/hay
            19: 82,  # cultivated crops
            20: 90,  # woody wetlands
            21: 95,  # emergent herbaceous wetlands
        }

        k = np.array(list(MAPPING.keys()))
        v = np.array(list(MAPPING.values()))

        mapping_ar = np.zeros(k.max() + 1, dtype=v.dtype)
        mapping_ar[k] = v
        img = mapping_ar[img]

    return img

def quad_fetch(fetcher, bbox, num_threads=4, qq=False, *args, **kwargs):
    """Breaks user-provided bounding box into quadrants and retrieves data
    using `fetcher` for each quadrant in parallel using a ThreadPool.

    Parameters
    ----------
    fetcher : callable
      data-fetching function, expected to return an array-like object
    bbox : 4-tuple or list
      coordinates of x_min, y_min, x_max, and y_max for bounding box of tile
    num_threads : int
      number of threads to use for parallel executing of data requests
    qq : bool
      whether or not to execute request for quarter quads, which executes this
      function recursively for each quadrant
    *args
      additional positional arguments that will be passed to `fetcher`
    **kwargs
      additional keyword arguments that will be passed to `fetcher`

    Returns
    -------
    quad_img : array
      image returned with quads stitched together into a single array

    """
    bboxes = split_quad(bbox)

    if qq:
        nw = quad_fetch(fetcher, bbox=bboxes[0], *args, **kwargs)
        ne = quad_fetch(fetcher, bbox=bboxes[1], *args, **kwargs)
        sw = quad_fetch(fetcher, bbox=bboxes[2], *args, **kwargs)
        se = quad_fetch(fetcher, bbox=bboxes[3], *args, **kwargs)

    else:
        get_quads = partial(fetcher, *args, **kwargs)
        with ThreadPool(num_threads) as p:
            quads = p.map(get_quads, bboxes)
            nw, ne, sw, se = quads

    quad_img = np.vstack([np.hstack([nw, ne]), np.hstack([sw, se])])

    return quad_img

def split_quad(bbox):
    xmin, ymin, xmax, ymax = bbox
    nw_bbox = [xmin, (ymin + ymax) / 2, (xmin + xmax) / 2, ymax]
    ne_bbox = [(xmin + xmax) / 2, (ymin + ymax) / 2, xmax, ymax]
    sw_bbox = [xmin, ymin, (xmin + xmax) / 2, (ymin + ymax) / 2]
    se_bbox = [(xmin + xmax) / 2, ymin, xmax, (ymin + ymax) / 2]

    return [nw_bbox, ne_bbox, sw_bbox, se_bbox]

In [6]:
LOOKUP_LAYER_BY_YEAR = {2009: 'NLCD_2008_Land_Cover_L48',
                        2011: 'NLCD_2011_Land_Cover_L48', 
                        2012: 'NLCD_2011_Land_Cover_L48', 
                        2013: 'NLCD_2013_Land_Cover_L48',
                        2014: 'NLCD_2013_Land_Cover_L48', 
                        2015: 'NLCD_2013_Land_Cover_L48', 
                        2016: 'NLCD_2016_Land_Cover_L48',
                        2017: 'NLCD_2016_Land_Cover_L48',
}

# Download Data for Training Tiles
Here is where are shapefiles of USGS Quarter Quads live:


These functions will loop through a GeoDataFrame, fetch the relevant data, and write GeoTiffs to disk in the appropriate formats.

In [7]:
def fetch_nlcd(gdf, state, year, overwrite=False):
    epsg = gdf.crs.to_epsg()
    print('Fetching NLCD data for {:,d} tiles'.format(len(gdf)))
    
    PROFILE = {
        'driver': 'GTiff',
        'interleave': 'band',
        'tiled': True,
        'blockxsize': 256,
        'blockysize': 256,
        'compress': 'lzw',
        'nodata': 0,
        'dtype': rasterio.uint8,
        'count': 1,
        }

    ## loop through all the geometries in the geodataframe and fetch NLCD data

    for idx, row in gdf.iterrows():
        xmin, ymin, xmax, ymax = row['geometry'].bounds
        xmin, ymin = np.floor((xmin, ymin))
        xmax, ymax = np.ceil((xmax, ymax))

        width, height = xmax-xmin, ymax-ymin
        trf = transform.from_bounds(xmin, ymin, xmax, ymax, width, height)

        ## don't bother fetching data if we already have processed this tile
        if state.lower() == 'or':
            state_name = 'oregon'
        elif state.lower() == 'wa':
            state_name = 'washington'
        outdir = f'/content/drive/Shared drives/stand_mapping/data/interim/training_tiles/{state_name}/nlcd/{year}'
        outname = f'{row.CELL_ID}_nlcd_{year}.tif'
        outfile = os.path.join(outdir, outname)        
        if os.path.exists(outfile) and not overwrite:
            if idx % 100 == 0:
                print()
            if idx % 10 == 0:
                print(idx, end='')
            else:
                print('.', end='')
            continue
        
        layer = LOOKUP_LAYER_BY_YEAR[year] # nlcd layer to fetch
        try:
            nlcd = quad_fetch(nlcd_from_mrlc, 
                            bbox=[xmin, ymin, xmax, ymax], 
                            res=1, inSR=epsg, noData=0, layer=layer)
        except:
            print('x', end='')
            continue
        
        ## write the data to disk
        PROFILE.update(width=width, height=height)

        with rasterio.open(outfile, 'w', 
                           **PROFILE, crs=epsg, transform=trf) as dst:
            dst.write(nlcd.astype(rasterio.uint8), 1)
            dst.set_band_description(1, 'NLCD retrieved from MRLCC')
        
        ## report progress
        if idx % 100 == 0:
            print()
        if idx % 10 == 0:
            print(idx, end='')
        else:
            print('.', end='')
    print()

## Fetch NLCD layer for each tile in each year

In [9]:
GDF = or11_gdf
STATE = 'OR'
YEARS = [2009, 2011, 2012, 2014, 2016]

for year in YEARS:
    print(year)
    fetch_nlcd(GDF, 'OR', year)

2009
Fetching NLCD data for 524 tiles

0.........10.........20.........30.........40.........50.........60.........70.........80.........90.........
100.........110.........120.........130.........140.........150.........160.........170.........180.........190.........
200.........210.........220.........230.........240.........250.........260.........270.........280.........290.........
300.........310.........320.........330.........340.........350.........360.........370.........380.........390.........
400.........410.........420.........430.........440.........450.........460.........470.........480.........490.........
500.........510.........520...
2011
Fetching NLCD data for 524 tiles

0.........10.........20.........30.........40.........50.........60.........70.........80.........90.........
100.........110.........120.........130.........140.........150.........160.........170.........180.........190.........
200.........210.........220.........230.........240.........250...

In [10]:
GDF = or10_gdf
STATE = 'OR'
YEARS = [2009, 2011, 2012, 2014, 2016]

for year in YEARS:
    print(year)
    fetch_nlcd(GDF, STATE, year)

2009
Fetching NLCD data for 607 tiles

0.........10.........20.........30.........40.........50.........60.........70.........80.........90.........
100.........110.........120.........130.........140.........150.........160.........170.........180.........190.........
200.........210.........220.........230.........240.........250.........260.........270.........280.........290.........
300.........310.........320.........330.........340.........350.........360.........370.........380.........390.........
400.........410.........420.........430.........440.........450.........460.........470.........480.........490.........
500.........510.........520.........530.........540.........550.........560.........570.........580.........590.......x.
600......
2011
Fetching NLCD data for 607 tiles

0.........10.........20.........30.........40.........50.........60.........70.........80.........90.........
100.........110.........120.........130.........140.........150.........160.........170

In [11]:
GDF = wa11_gdf
STATE = 'WA'
YEARS = [2009, 2011, 2013, 2015, 2017]

for year in YEARS:
    print(year)
    fetch_nlcd(GDF, STATE, year)

2009
Fetching NLCD data for 82 tiles

0.........10.........20.........30.........40.........50.........60.........70.........80.
2011
Fetching NLCD data for 82 tiles

0.........10.........20.........30.........40.........50.........60.........70.........80.
2013
Fetching NLCD data for 82 tiles

0.........10.........20.........30.........40.........50.........60.........70.........80.
2015
Fetching NLCD data for 82 tiles

0.........10.........20.........30.........40.........50.........60.........70.........80.
2017
Fetching NLCD data for 82 tiles

0.........10.........20.........30.........40.........50.........60.........70.........80.


In [12]:
GDF = wa10_gdf
STATE = 'WA'
YEARS = [2009, 2011, 2013, 2015, 2017]

for year in YEARS:
    print(year)
    fetch_nlcd(GDF, STATE, year)

2009
Fetching NLCD data for 277 tiles

0.........10.........20.........30.........40.........50.........60.........70.........80.........90.........
100.........110.........120.........130.........140.........150.........160.........170.........180.........190.........
200.........210.........220.........230.........240.........250.........260.........270......
2011
Fetching NLCD data for 277 tiles

0.........10.........20.........30.........40.........50.........60.........70.........80.........90.........
100.........110.........120.........130.........140.........150.........160.........170.........180.........190.........
200.........210.........220.........230.........240.........250.........260.........270......
2013
Fetching NLCD data for 277 tiles

0.........10.........20.........30.........40.........50.........60.........70.........80.........90.........
100.........110.........120.........130.........140.........150.........160.........170.........180.........190.........
20