## 2. Crosswalk Geospatial Fabric Attributes to Hydroviz Subset 

In this notebook, the NHM geospatial fabric files are used to crosswalk stream segment IDs and watershed IDs from the project geospatial data (`Segments_subset.shp` and `HRU_subset.shp`) to the attributes in the NHM geospatial fabric, specifically the GNIS name attributes present in the POI (point of interest) layer and the USGS gage IDs. This will associate common names and gage references to the modeled stream segments and watersheds.

In [1]:
import os
from pathlib import Path
import fiona
import geopandas as gpd

Get the geodatabase path, and check out the layers. We want to read in the national identifier layers and the POI layer.

In [2]:
gdb_path = "/import/beegfs/CMIP6/jdpaul3/hydroviz_data/gis/GeospatialFabric_National.gdb"
fiona.listlayers(gdb_path)

['POIs',
 'one',
 'nhdflowline_en',
 'nhdflowline',
 'regionOutletDA',
 'nhruNationalIdentifier',
 'nsegmentNationalIdentifier']

In [3]:
poi_gdf = gpd.read_file(gdb_path, layer='POIs', encoding='utf-8')
gf_seg_gdf = gpd.read_file(gdb_path, layer='nsegmentNationalIdentifier', encoding='utf-8')

Join the POI names using the POI ID.

In [4]:
gf_seg = gf_seg_gdf[['seg_id_nat', 'POI_ID']]
gf_seg['POI_ID'] = gf_seg['POI_ID'].astype(int)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  gf_seg['POI_ID'] = gf_seg['POI_ID'].astype(int)


In [5]:
poi = poi_gdf[['COMID', 'GNIS_NAME', 'poi_gage_id']]
poi['COMID'] = poi['COMID'].astype(int)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  poi['COMID'] = poi['COMID'].astype(int)


In [6]:
gf_seg = gf_seg.join(poi.set_index('COMID'), on='POI_ID')

Now read in the project segment shapefiles, and join the names.

In [7]:
seg_gdf = gpd.read_file("/import/beegfs/CMIP6/jdpaul3/hydroviz_data/gis/Segments_subset.shp")

In [8]:
seg_gdf = seg_gdf.join(gf_seg.set_index('seg_id_nat'), on='seg_id_nat')

Clean up gage IDs.

In [10]:
# add "USGS-" prefix to 'poi_gage_id'
seg_gdf['GAUGE_ID'] = seg_gdf['poi_gage_id'].apply(lambda x: f"USGS-{x}")

# replace "USGS-0" with "NA"
seg_gdf['GAUGE_ID'] = seg_gdf['GAUGE_ID'].replace("USGS-0", "NA")

# drop unnecessary columns
seg_gdf = seg_gdf[['seg_id_nat', 'GNIS_NAME', 'GAUGE_ID', 'geometry']]

In [11]:
seg_gdf.head()

Unnamed: 0,seg_id_nat,GNIS_NAME,GAUGE_ID,geometry
0,1,West Branch Mattawamkeag River,,"LINESTRING (2101948.624 2876678.641, 2101941.3..."
1,2,Baskahegan Stream,,"LINESTRING (2167789.031 2829021.852, 2167729.9..."
2,3,Mattawamkeag River,,"LINESTRING (2131936.492 2865675.020, 2131955.7..."
3,4,Mattawamkeag River,,"LINESTRING (2151719.943 2849594.051, 2151812.0..."
4,5,Baskahegan Stream,,"LINESTRING (2155981.103 2842240.715, 2155894.2..."


Save to a subdirectory for further processing.

In [12]:
dir = Path("/import/beegfs/CMIP6/jdpaul3/hydroviz_data/gis/xwalk")
seg_gdf.to_file(os.path.join(dir, 'seg.shp'))