# Fetching Hydrology Data
We have prepared shapefiles containing the USGS quarter quadrangles that have good coverage of forest stand delineations that we want to grab other data for.

# Mount Google Drive 
So we can access our files showing tile locations, and save the rasters we will generate from the elevation data.

In [1]:
from google.colab import drive
drive.mount('/content/drive', force_remount=True)

Mounted at /content/drive


In [2]:
! sudo apt-get install -y libspatialindex-dev

Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following additional packages will be installed:
  libspatialindex-c4v5 libspatialindex4v5
The following NEW packages will be installed:
  libspatialindex-c4v5 libspatialindex-dev libspatialindex4v5
0 upgraded, 3 newly installed, 0 to remove and 21 not upgraded.
Need to get 555 kB of archives.
After this operation, 3,308 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu bionic/universe amd64 libspatialindex4v5 amd64 1.8.5-5 [219 kB]
Get:2 http://archive.ubuntu.com/ubuntu bionic/universe amd64 libspatialindex-c4v5 amd64 1.8.5-5 [51.7 kB]
Get:3 http://archive.ubuntu.com/ubuntu bionic/universe amd64 libspatialindex-dev amd64 1.8.5-5 [285 kB]
Fetched 555 kB in 1s (702 kB/s)
debconf: unable to initialize frontend: Dialog
debconf: (No usable dialog-like program is installed, so the dialog based frontend cannot be used. at /usr/share/perl5/Debconf/FrontEnd/Dialog.

In [3]:
! pip install geopandas rtree -q

[K     |████████████████████████████████| 972kB 2.8MB/s 
[K     |████████████████████████████████| 71kB 8.3MB/s 
[K     |████████████████████████████████| 10.9MB 14.9MB/s 
[K     |████████████████████████████████| 14.8MB 307kB/s 
[?25h  Building wheel for rtree (setup.py) ... [?25l[?25hdone


In [4]:
import numpy as np
import geopandas as gpd
import os
import requests
from shapely.geometry import box, Polygon

The following function will retrieve the hydro data from The National Map's web service.

In [5]:
def nhd_from_tnm(nhd_layer,
                 bbox,
                 inSR=4326,
                 **kwargs):
    """Returns features from the National Hydrography Dataset Plus High
    Resolution web service from The National Map.

    Available layers are:

    =========  ======================
    NHD Layer  Description
    =========  ======================
    0          NHDPlusSink
    1          NHDPoint
    2          NetworkNHDFlowline
    3          NonNetworkNHDFlowline
    4          FlowDirection
    5          NHDPlusWall
    6          NHDPlusBurnLineEvent
    7          NHDLine
    8          NHDArea
    9          NHDWaterbody
    10         NHDPlusCatchment
    11         WBDHU12
    =========  ======================

    Parameters
    ----------
    nhd_layer : int
       a value from 0-11 indicating the feature layer to retrieve.
    bbox : list-like
      list of bounding box coordinates (minx, miny, maxx, maxy).
    inSR : int
      spatial reference for bounding box, such as an EPSG code (e.g., 4326)

    Returns
    -------
    clip_gdf : GeoDataFrame
      features in vector format, clipped to bbox
    """
    BASE_URL = ''.join([
        'https://hydro.nationalmap.gov/arcgis/rest/services/NHDPlus_HR/',
        'MapServer/',
        str(nhd_layer), '/query?'
    ])

    params = dict(where=None,
                  text=None,
                  objectIds=None,
                  time=None,
                  geometry=','.join([str(x) for x in bbox]),
                  geometryType='esriGeometryEnvelope',
                  inSR=inSR,
                  spatialRel='esriSpatialRelIntersects',
                  relationParam=None,
                  outFields='*',
                  returnGeometry='true',
                  returnTrueCurves='false',
                  maxAllowableOffset=None,
                  geometryPrecision=None,
                  outSR=inSR,
                  having=None,
                  returnIdsOnly='false',
                  returnCountOnly='false',
                  orderByFields=None,
                  groupByFieldsForStatistics=None,
                  outStatistics=None,
                  returnZ='false',
                  returnM='false',
                  gdbVersion=None,
                  historicMoment=None,
                  returnDistinctValues='false',
                  resultOffset=None,
                  resultRecordCount=None,
                  queryByDistance=None,
                  returnExtentOnly='false',
                  datumTransformation=None,
                  parameterValues=None,
                  rangeValues=None,
                  quantizationParameters=None,
                  featureEncoding='esriDefault',
                  f='geojson')
    for key, value in kwargs.items():
        params.update({key: value})

    r = requests.get(BASE_URL, params=params)
    jsn = r.json()
    if len(jsn['features']) == 0:
        clip_gdf = gpd.GeoDataFrame(geometry=[Polygon()], crs=inSR)
    else:
        try:
            gdf = gpd.GeoDataFrame.from_features(jsn, crs=inSR)

        # this API seems to return M and Z values even if not requested
        # this catches the error and keeps only the first two coordinates (x and y)
        except AssertionError:
            for f in jsn['features']:
                f['geometry'].update({
                    'coordinates': [c[0:2] for c in f['geometry']['coordinates']]
                })
            gdf = gpd.GeoDataFrame.from_features(jsn)

        clip_gdf = gpd.clip(gdf, box(*bbox))
        if len(clip_gdf) == 0:
            clip_gdf = gpd.GeoDataFrame(geometry=[Polygon()], crs=inSR)

    return clip_gdf

# Download Data for Training Tiles

This function will loop through a GeoDataFrame, fetch the relevant data, and write data to disk in the appropriate format.

In [6]:
def fetch_hydro(gdf, state, overwrite=False):
    epsg = gdf.crs.to_epsg()
    print('Fetching hydro data for {:,d} tiles'.format(len(gdf)))

    ## loop through all the geometries in the geodataframe

    for idx, row in gdf.iterrows():
        xmin, ymin, xmax, ymax = row['geometry'].bounds
        xmin, ymin = np.floor((xmin, ymin))
        xmax, ymax = np.ceil((xmax, ymax))

        bbox = [xmin, ymin, xmax, ymax]

        ## don't bother fetching data if we already have processed this tile
        OUTROOT = '/content/drive/Shared drives/stand_mapping/data/interim/training_tiles'
        outfolder = f'{state.lower()}/hydro'
        outdir = os.path.join(OUTROOT, outfolder)

        flow_outname = f'{row.CELL_ID}_flowlines.geojson'
        waterbody_outname = f'{row.CELL_ID}_waterbodies.geojson'

        flow_outfile = os.path.join(outdir, flow_outname)     
        waterbody_outfile = os.path.join(outdir, waterbody_outname)  

        if (os.path.exists(flow_outfile) and os.path.exists(waterbody_outfile)) and not overwrite:
            if idx % 100 == 0:
                print()
            if idx % 10 == 0:
                print(idx, end='')
            else:
                print('.', end='')
            continue
        
        flow = nhd_from_tnm(4, bbox, epsg)
        waterbody = nhd_from_tnm(9, bbox, epsg)

        flow.to_file(flow_outfile, driver='GeoJSON')
        waterbody.to_file(waterbody_outfile, driver='GeoJSON')

        ## report progress
        if idx % 100 == 0:
            print()
        if idx % 10 == 0:
            print(idx, end='')
        else:
            print('.', end='')

## Fetch Hydro Layers for each tile

In [7]:
SHP_DIR = '/content/drive/Shared drives/stand_mapping/data/interim'

WA11_SHP = 'washington_utm11n_training_quads_epsg6340.shp'
WA10_SHP = 'washington_utm10n_training_quads_epsg6339.shp'
OR10_SHP = 'oregon_utm10n_training_quads_epsg6339.shp'
OR11_SHP = 'oregon_utm11n_training_quads_epsg6340.shp'

or10_gdf = gpd.read_file(os.path.join(SHP_DIR, OR10_SHP))
or11_gdf = gpd.read_file(os.path.join(SHP_DIR, OR11_SHP))
wa10_gdf = gpd.read_file(os.path.join(SHP_DIR, WA10_SHP))
wa11_gdf = gpd.read_file(os.path.join(SHP_DIR, WA11_SHP))

In [8]:
GDF = wa11_gdf
STATE = 'washington'

fetch_hydro(GDF, STATE)

Fetching hydro data for 82 tiles

0.........10.........20.........30.........40.........50.........60.........70.........80.

In [9]:
GDF = wa10_gdf
STATE = 'washington'

fetch_hydro(GDF, STATE)

Fetching hydro data for 277 tiles

0.........10.........20.........30.........40.........50.........60.........70.........80.........90.........
100.........110.........120.........130.........140.........150.........160.........170.........180.........190.........
200.........210.........220.........230.........240.........250.........260.........270......

In [10]:
GDF = or10_gdf
STATE = 'oregon'

fetch_hydro(GDF, STATE)

Fetching hydro data for 607 tiles

0.........10.........20.........30.........40.........50.........60.........70.........80.........90.........
100.........110.........120.........130.........140.........150.........160.........170.........180.........190.........
200.........210.........220.........230.........240.........250.........260.........270.........280.........290.........
300.........310.........320.........330.........340.........350.........360.........370.........380.........390.........
400.........410.........420.........430.........440.........450.........460.........470.........480.........490.........
500.........510.........520.........530.........540.........550.........560.........570.........580.........590.........
600......

In [11]:
GDF = or11_gdf
STATE = 'oregon'

fetch_hydro(GDF, STATE)

Fetching hydro data for 524 tiles

0.........10.........20.........30.........40.........50.........60.........70.........80.........90.........
100.........110.........120.........130.........140.........150.........160.........170.........180.........190.........
200.........210.........220.........230.........240.........250.........260.........270.........280.........290.........
300.........310.........320.........330.........340.........350.........360.........370.........380.........390.........
400.........410.........420.........430.........440.........450.........460.........470.........480.........490.........
500.........510.........520...