# U.S. State Boundaries into the Spatial Feature Registry

#### This code is in progress.  The code registers the state boundaries (U.S.) into the Spatial Feature Registry (SFR within Data Distilleries GC2 instance) using the following workflow.  All data are retained from the source (unaltered), three registration fields are added (_id, reg_date, reg_source) and data are exported to a GeoJSON file.   The GeoJSON file is then uploaded to ScienceBase to document the final data as it is represented in the SFR.  Currently we are uploading data to the SFR using a manual process, with plans to automate this step in the future. 

#### General workflow involves:
     1: Retrieve Data From Source (ScienceBase Item: 58259697e4b01fad86db263f)
     2: Create GeoDataFrame and identify native crs
     3: Define Variables needed throughout process
     4: Create new ScienceBase item to describe registration process
     5: Build and export GeoJSON representation of the data.  This process includes the addition of two registration fields that document information about registration (reg_source-> points to new SB item), and a registered uuid (_id).  
     6: Upload GeoJSON file to new ScienceBase item to document what was registered into SFR, along with additional information about when and how registration occured.  This process will likely change as we introduce a more systematic way of tracking prov.   During this step the user will upload data to GC2 as well (SFR schema).  Currently this process is done manually through the UI.
     

Code by: Daniel Wieferich (USGS)

Date: 20180330

In [1]:
#Import Needed Packages
import geopandas as gpd
import urllib.request as ur
import subprocess
import geojson
from sfr_load_utils import *

#### Step 1: Retrieve data from source

In [None]:
### Step 1: Retrieve Dataset from ScienceBase
#States Dataset stored at https://www.sciencebase.gov/catalog/item/58259697e4b01fad86db263f

#Define url of zipped shapefile download
downloadUrl ='https://prd-tnm.s3.amazonaws.com/StagedProducts/GovtUnit/GDB/National_GovernmentUnits.zip'
#Download government unit file to local directory
ur.urlretrieve(downloadUrl, 'National_GovernmentUnits.zip')
#In working directory unzips file
subprocess.call(r'"C:\Program Files\7-Zip\7z.exe" x ' + 'National_GovernmentUnits.zip' )

#### Step 2: Import shapefile into GeoDataFrame and identify native crs

In [2]:
#Create GeoDataFrame from geodatabase
df = gpd.read_file('gu/GU.gdb', layer='GU_StateOrTerritory')

In [3]:
#Eventually will need a coded method to extract the epsg number (used as variable later), might be tricky given how this is returned
df.crs

{'init': 'epsg:4269'}

In [4]:
df.head()

Unnamed: 0,AREASQKM,DATA_SECURITY,DISTRIBUTION_POLICY,FCODE,GLOBALID,GNIS_ID,GNIS_NAME,LOADDATE,PERMANENT_IDENTIFIER,POPULATION,SHAPE_Area,SHAPE_Length,SOURCE_DATADESC,SOURCE_DATASETID,SOURCE_FEATUREID,SOURCE_ORIGINATOR,STATE_FIPSCODE,STATE_NAME,geometry
0,62755.50521,5,E4,61100,{77DBC4B4-2548-4034-9018-2365E87D4B35},1779805,State of West Virginia,2016-10-05T00:00:00,94dc5bf9-9061-4c53-b8cb-8d00fc458c77,1852994,6.49387,20.741888,"2016 TIGER/Line Shapefile, Current State and E...",b832675b-9050-4e36-bb8f-8439ba729501,54,U.S. Census Bureau,54,West Virginia,(POLYGON ((-81.74725400021896 39.0953789997328...
1,170310.259667,5,E4,61100,{75AD4740-6CB2-47A5-B692-4B3797FC1693},294478,State of Florida,2016-10-05T00:00:00,387d6a6e-f7ba-4ec2-9940-316a03c8a5e2,18801310,15.700129,31.672645,"2016 TIGER/Line Shapefile, Current State and E...",b832675b-9050-4e36-bb8f-8439ba729501,12,U.S. Census Bureau,12,Florida,(POLYGON ((-86.38864600030621 30.9941809998075...
2,149995.384423,5,E4,61100,{9BE689BA-78E3-4C9C-A3C5-DFAB670AEA52},1779784,State of Illinois,2016-10-05T00:00:00,5521002d-ae9a-4a26-a38a-a08210488246,12830632,15.852607,22.531785,"2016 TIGER/Line Shapefile, Current State and E...",b832675b-9050-4e36-bb8f-8439ba729501,17,U.S. Census Bureau,17,Illinois,(POLYGON ((-91.18529500015256 40.6378030004636...
3,169634.99266,5,E4,61100,{50373FFB-392D-45FA-AC81-AC29E5C9606E},1779806,State of Wisconsin,2016-10-05T00:00:00,9e88b3f5-0d44-4f3c-bf25-58ea502b8fe5,5686986,19.244609,23.449681,"2016 TIGER/Line Shapefile, Current State and E...",b832675b-9050-4e36-bb8f-8439ba729501,55,U.S. Census Bureau,55,Wisconsin,(POLYGON ((-92.88706699959437 45.6441479999844...
4,254799.39945,5,E4,61100,{64108400-253F-4675-B712-6D06E95F3FB2},1155107,State of Oregon,2016-10-05T00:00:00,29296cc8-8b7b-4970-9cf8-e6bbc2eec3de,3831074,28.568477,25.977417,"2016 TIGER/Line Shapefile, Current State and E...",b832675b-9050-4e36-bb8f-8439ba729501,41,U.S. Census Bureau,41,Oregon,(POLYGON ((-124.0654480001576 45.7830539997688...


#### Step 3: Define Variables

In [5]:
#User Defined Variables
epsg = {'code':'4269'}
expected_geom_type = 'MultiPolygon'
outfile_name = 'state_boundaries'
source_sbitem = '58259697e4b01fad86db263f'
list_tags = ['Jurisdictional Units','BIS Spatial Feature Registry','United States']
date = '2018-04-02'
data_name = 'U.S. State Boundaries'


#### Step 4: Create SB Item to describe SFR Registration 

In [6]:
#Build SB Item to house SFR GeoJSON File, including description of item.  
#This step outputs source_uri (uri to the new sb item that describes the data) to be included as registration information.

#Turns list of tags into json format accepted by SB
sb_tags = build_sb_tags(list_tags)
#Create SB session and log in
sb = sb_login()   
#Creates JSON needed to build and describe new SB item
item_info = sfr_item_info_from_incomplete_sb_item(sb,source_sbitem, sb_tags, date, data_name)
print (item_info)
#Builds new SB item
new_item = build_new_sfr_sbitem(sb,item_info)
#URI of new SB item.  This is inserted into GEOJSON so we have a direct connection in SFR to documentation... this step may not
#be needed as we build prov capabilities.


username: dwieferich@usgs.gov
········
{'title': 'Spatial Feature Registration Files for U.S. State Boundaries', 'parentId': '55fafaf5e4b05d6c4e501b81', 'summary': 'U.S. State Boundaries data registered into the spatial feature registry. Source data is documented at https://www.sciencebase.gov/catalog/item/58259697e4b01fad86db263f', 'tags': [{'type': 'Subject', 'name': 'Jurisdictional Units'}, {'type': 'Subject', 'name': 'BIS Spatial Feature Registry'}, {'type': 'Subject', 'name': 'United States'}], 'dates': [{'type': 'creation', 'dateString': '2018-04-02', 'label': 'Creation'}], 'purpose': 'These spatial data were ingested into the Spatial Feature Registry (SFR) data system within the Biological Information System.', 'webLinks': [{'type': 'webLink', 'typeLabel': 'Web Link', 'uri': 'https://www.sciencebase.gov/catalog/item/58259697e4b01fad86db263f', 'rel': 'related', 'title': 'source data documentation'}]}


In [7]:
source_uri = str(new_item['link']['url'])
print (source_uri)

https://www.sciencebase.gov/catalog/item/5ac25756e4b0e2c2dd0aa0f9


#### Step 5: Build and export GeoJSON representation of data.  Add registration id and source_uri (newly created SB item). Verify that the correct number of features were included in the GeoJSON dataset.

In [8]:
collection = df_to_geojson(df, epsg, source_uri, expected_geom_type)
print (verify_correct_count(collection, df))

#export_geojson(outfile_name, collection)
#Add file to SB Item

Correct number of features


In [9]:
file = export_geojson(outfile_name, collection)
outfile_zip = zip_geojson(outfile_name)

#### Step 6: Upload GeoJSON file to ScienceBase Item and also upload to GC2 using UI (make sure to specify UTF-8 encoding and MultiPolygon).

In [10]:
sb.upload_file_to_item(new_item, outfile_zip)

{'dates': [{'dateString': '2018-04-02',
   'label': 'Creation',
   'type': 'creation'}],
 'distributionLinks': [{'files': [{'contentType': 'application/zip',
     'name': 'state_boundaries.zip',
     'size': 11537430,
     'title': None}],
   'name': 'SpatialFeatureR.zip',
   'rel': 'alternate',
   'title': 'Download Attached Files',
   'type': 'downloadLink',
   'typeLabel': 'Download Link',
   'uri': 'https://www.sciencebase.gov/catalog/file/get/5ac25756e4b0e2c2dd0aa0f9'}],
 'files': [{'checksum': None,
   'contentEncoding': None,
   'contentType': 'application/zip',
   'dateUploaded': '2018-04-02T16:16:50Z',
   'downloadUri': 'https://www.sciencebase.gov/catalog/file/get/5ac25756e4b0e2c2dd0aa0f9?f=__disk__53%2Fdf%2F03%2F53df03d4cc0abfe474ce2035b4457e2b7e9e05cd',
   'imageHeight': None,
   'imageWidth': None,
   'name': 'state_boundaries.zip',
   'originalMetadata': None,
   'pathOnDisk': '__disk__53/df/03/53df03d4cc0abfe474ce2035b4457e2b7e9e05cd',
   'processToken': None,
   'proces

In [None]:
#Currently the new SB item needs to have some additional information uploaded.  The UI can be used for this for now but in the future we will want to build as much as we can into this process.