#### Download Geospatial Fabric from ScienceBase and Crosswalk

This notebook downloads the NHM geospatial fabric from ScienceBase using their API, which means direct download from AWS S3 buckets instead of point and click downloads from their web interface. This notebook follows usage instructions in their github [documentation](https://github.com/DOI-USGS/sciencebasepy/blob/master/README.md).

Only use this notebook if you need your own copy of the dataset. Otherwise just use the data located in the directory below.

The NHM geospatial fabric files are used to crosswalk stream segment IDs and watershed IDs from the project geospatial data (`Segments_subset.shp` and `HRU_subset.shp`) to the attributes in the NHM geospatial fabric, specifically the GNIS name attributes present in the POI (point of interest) layer. This will associate common names to the modeled stream segments and watersheds.

In [64]:
import sciencebasepy
import os
from pathlib import Path
import fiona
import geopandas as gpd
import pandas as pd

#dir = Path("/import/beegfs/CMIP6/jdpaul3/hydroviz_data")
dir = Path("/Users/joshpaul/secasc_hydroviz/hydroviz_data")

Establish a session and get public items.  No need to log in!

In [15]:
sb = sciencebasepy.SbSession()

# This is the NHM geospatial fabric item from here: https://www.sciencebase.gov/catalog/item/5362b683e4b0c409c6289bf6
item = '5362b683e4b0c409c6289bf6'

Get the item JSON and check the files within.

In [16]:
item_json = sb.get_item(item)
for file in item_json['files']:
    print(file['name'])

GeospatialFabricFeatures_01.zip
GeospatialFabricFeatures_02.zip
GeospatialFabricFeatures_03.zip
GeospatialFabricFeatures_04.zip
GeospatialFabricFeatures_05.zip
GeospatialFabricFeatures_06.zip
GeospatialFabricFeatures_07.zip
GeospatialFabricFeatures_08.zip
GeospatialFabricFeatures_09.zip
GeospatialFabricFeatures_10L.zip
GeospatialFabricFeatures_10U.zip
GeospatialFabricFeatures_11.zip
GeospatialFabricFeatures_12.zip
GeospatialFabricFeatures_13.zip
GeospatialFabricFeatures_14.zip
GeospatialFabricFeatures_15.zip
GeospatialFabricFeatures_16.zip
GeospatialFabricFeatures_17.zip
GeospatialFabricFeatures_18.zip
GeospatialFabricFeatures_20.zip
GeospatialFabricFeatures_21.zip
GeospatialFabric_National.gdb.zip


We want the whole CONUS, so might as well download `GeospatialFabric_National.gdb.zip`. Let's use the `sciencebasepy.download_file()` function to download that file by URL, and then unzip it to our project `gis` subdirectory. 

In [18]:
for file in item_json['files']:
    if file['name'] == 'GeospatialFabric_National.gdb.zip':
        sb.download_file(file['url'], os.path.join(dir, "gis", file['name']))

downloading https://www.sciencebase.gov/catalog/file/get/5362b683e4b0c409c6289bf6?f=__disk__41%2F2e%2Ff6%2F412ef640321f29c011095d1209103bfb3688d021 to /Users/joshpaul/secasc_hydroviz/hydroviz_data/gis/GeospatialFabric_National.gdb.zip


In [26]:
zippath = list(dir.glob(f'**/GeospatialFabric_National.gdb.zip'))[0]
output_dir = zippath.parent
zippath_str = str(zippath)
!unzip {zippath_str} -d {output_dir}

Archive:  /Users/joshpaul/secasc_hydroviz/hydroviz_data/gis/GeospatialFabric_National.gdb.zip
  inflating: /Users/joshpaul/secasc_hydroviz/hydroviz_data/gis/GeospatialFabric_National.gdb/a00000001.freelist  
  inflating: /Users/joshpaul/secasc_hydroviz/hydroviz_data/gis/GeospatialFabric_National.gdb/a00000001.gdbindexes  
  inflating: /Users/joshpaul/secasc_hydroviz/hydroviz_data/gis/GeospatialFabric_National.gdb/a00000001.gdbtable  
  inflating: /Users/joshpaul/secasc_hydroviz/hydroviz_data/gis/GeospatialFabric_National.gdb/a00000001.gdbtablx  
  inflating: /Users/joshpaul/secasc_hydroviz/hydroviz_data/gis/GeospatialFabric_National.gdb/a00000001.TablesByName.atx  
  inflating: /Users/joshpaul/secasc_hydroviz/hydroviz_data/gis/GeospatialFabric_National.gdb/a00000002.gdbtable  
  inflating: /Users/joshpaul/secasc_hydroviz/hydroviz_data/gis/GeospatialFabric_National.gdb/a00000002.gdbtablx  
  inflating: /Users/joshpaul/secasc_hydroviz/hydroviz_data/gis/GeospatialFabric_National.gdb/a0000

Get the geodatabase path, and check out the layers. We want to read in the national identifier layers and the POI layer.

In [33]:
gdb_path = list(output_dir.glob('**/*.gdb'))[0]
fiona.listlayers(gdb_path)

['POIs',
 'one',
 'nhdflowline_en',
 'nhdflowline',
 'regionOutletDA',
 'nhruNationalIdentifier',
 'nsegmentNationalIdentifier']

In [50]:
poi_gdf = gpd.read_file(gdb_path, layer='POIs', encoding='utf-8')
gf_hru_gdf = gpd.read_file(gdb_path, layer='nhruNationalIdentifier', encoding='utf-8')
gf_seg_gdf = gpd.read_file(gdb_path, layer='nsegmentNationalIdentifier', encoding='utf-8')

  as_dt = pd.to_datetime(df[k], errors="ignore")


Join the POI names using the POI ID.

In [51]:
gf_seg = gf_seg_gdf[['seg_id_nat', 'POI_ID']]
gf_seg['POI_ID'] = gf_seg['POI_ID'].astype(int)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  gf_seg['POI_ID'] = gf_seg['POI_ID'].astype(int)


In [52]:
gf_hru = gf_hru_gdf[['hru_id_nat', 'POI_ID']]
gf_hru['POI_ID'] = gf_hru['POI_ID'].astype(int)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  gf_hru['POI_ID'] = gf_hru['POI_ID'].astype(int)


In [53]:
poi = poi_gdf[['COMID', 'GNIS_NAME']]
poi['COMID'] = poi['COMID'].astype(int)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  poi['COMID'] = poi['COMID'].astype(int)


In [59]:
gf_seg = gf_seg.join(poi.set_index('COMID'), on='POI_ID')
gf_hru = gf_hru.join(poi.set_index('COMID'), on='POI_ID')

Now read in the project segment and watershed shapefiles, and join the names.

In [66]:
project_shps = list(dir.glob(f'**/*subset.shp'))
project_shps

[PosixPath('/Users/joshpaul/secasc_hydroviz/hydroviz_data/gis/HRU_subset.shp'),
 PosixPath('/Users/joshpaul/secasc_hydroviz/hydroviz_data/gis/Segments_subset.shp')]

In [72]:
hru_gdf = gpd.read_file(project_shps[0])
seg_gdf = gpd.read_file(project_shps[1])

In [76]:
hru_gdf = hru_gdf.join(gf_hru.set_index('hru_id_nat'), on='hru_id_nat')
seg_gdf = seg_gdf.join(gf_seg.set_index('seg_id_nat'), on='seg_id_nat')

In [78]:
hru_gdf.head()

Unnamed: 0,region,hru_id_nat,geometry,POI_ID,GNIS_NAME
0,17,93013,"POLYGON ((-1661303.115 2221244.978, -1661298.6...",23336004,South Fork Owyhee River
1,17,93014,"MULTIPOLYGON (((-1532954.896 2215904.774, -153...",23198872,Dry Creek
2,17,93015,"POLYGON ((-1532985.250 2215875.115, -1532984.8...",23198872,Dry Creek
3,17,93016,"POLYGON ((-1546064.860 2230034.835, -1546065.0...",23196836,Jakes Creek
4,17,93017,"POLYGON ((-1546065.114 2230081.069, -1546065.0...",23196800,Jakes Creek


In [80]:
seg_gdf.head()

Unnamed: 0,region,seg_id_nat,geometry,POI_ID,GNIS_NAME
0,1,1,"LINESTRING (2101948.624 2876678.641, 2101941.3...",955,West Branch Mattawamkeag River
1,1,2,"LINESTRING (2167789.031 2829021.852, 2167729.9...",1691,Baskahegan Stream
2,1,3,"LINESTRING (2131936.492 2865675.020, 2131955.7...",1933,Mattawamkeag River
3,1,4,"LINESTRING (2151719.943 2849594.051, 2151812.0...",1945,Mattawamkeag River
4,1,5,"LINESTRING (2155981.103 2842240.715, 2155894.2...",1947,Baskahegan Stream


Save to a local directory, then manually zip and move to the `shp` directory in the repo. This is the final geometry that will be uploaded onto GeoServer for query via the API!

In [81]:
tmp_dir = Path("/Users/joshpaul/secasc_hydroviz/other_gis/shp_for_geoserver")
repo_dir = Path("/Users/joshpaul/secasc_hydroviz/hydroviz")

seg_gdf.to_file(os.path.join(tmp_dir, 'seg.shp'))
hru_gdf.to_file(os.path.join(tmp_dir, 'hru.shp'))