# Introduction
The Biogeographic Characterization Branch of CSASL partners with many organizations to produce the official inventory of protected open space in the United States, the Protected Areas Database of the United States (PAD-US). Working with federal, state, local, national and nongovernmental organizations, the PAD-US group assembles, checks and produces integrated information that describes public open space and other protected areas and delineates their boundaries. The resulting national inventory is an key resource for informing decisions about conservation, recreation or land use planning at different scales and across administrative boundaries.  

This notebook provides an overview of that inventory's assets and foundational data management practices. It also provides an entry point for exploring the interplay of managed lands, national conservation policy and resource management decisions. A goal of this notebook is to provide easy access to BCB data assets associated with protected areas and managed lands in general, documenting programmatic ways to quickly inventory and explore those assets, and provide code examples for working with and analyzing these data.

## Data Management Plan Focus
The latest iteration of this notebook has been reorganized based on the sections of the PAD-US Data Management Plan written in early 2017. We've copied sections of that text here and then looked to see if there is a way to verify the concepts or pull information that backs them up in some way using code. Ideally, we should be able to have some form of validation for everything in the DMP. In some cases, we've made notes about how some parts of the data managenent process might be looked at in a different way.

The sections in the notebook follow the data management lifecycle organization of the DMP. We use the "bcb-dm" GitHub repo where this notebook lives to create specific issues to be worked on as part of this data management planning and execution process. Issues are labeled with "Protected Areas" and with a name aligning with stages of the data lifecycle as shown here and used as an organizing principle. Code blocks below each major section heading in this notebook access issues for that section (using labels) with GitHub API searches and display them dynamically. 

In [23]:
import requests
import datetime
from json2html import *
from IPython.display import display
from IPython.display import HTML

In [36]:
# Set up some parameters for this notebook
_gc2BaseURL = "https://gc2.datadistillery.org/api/v1/sql/bcb"
_sbCatalogBaseURL = "https://www.sciencebase.gov/catalog/item/"

gh = Github()

# Crude function to retrieve and print issues by label for the sections of this notebook
def getOpenIssuesForSection(label):
    preservationIssuesOpen = requests.get('https://api.github.com/search/issues?q=repo:usgs-bcb/bcb-dm+is:issue+is:open+label:%22Protected%20Areas%22+label:%22'+label+'%22').json()
    for issue in preservationIssuesOpen["items"]:
        print (issue["title"])
        print (issue["body"])
        print (issue["html_url"])
        print ("----")

# Product Information
Information about the PAD-US product as a whole can be found from the ScienceBase collection item that is built from a PAD-US metadata document and is used to house data files and distribution methods for the product. Information from the DMP on Product Name, Product Description, Product Owner, and Roles and Responsibilities should all be available from that item.

In [39]:
getOpenIssuesForSection("DMP Product Info")

Review and edit the links on PAD-US catalog items to make sure they are clear and appropriate
Links between items cataloging the various ways of accessing PAD-US and links to other related information are important aspects of documenting the system. The links on the main ScienceBase collection item that is used as a primary information point for PAD-US do not make complete sense. ScienceBase doesn't offer quite enough with regard to classifying or documenting web links, and neither does the FGDC metadata standard. However, we should make better use of title and type on these links to make it more clear what they are for and how they should be used in application.

We might also consider moving or copying the reference links from the abstract into the webLinks for added utility. Some of those references could be to other ScienceBase items where they can be described in better detail and provide access to reference materials.
https://github.com/usgs-bcb/bcb-dm/issues/26
----
Review mai

In [12]:
# Get the PAD-US (v1.4) ScienceBase collection item for summary
padusCollection = requests.get("https://www.sciencebase.gov/catalog/item/56bba648e4b08d617f657960?format=json&fields=title,body,purpose,contacts,webLinks").json()

HTML(json2html.convert(json = padusCollection))

0,1
link,urlhttps://www.sciencebase.gov/catalog/item/56bba648e4b08d617f657960relself
title,Protected Areas Database of the United States (PAD-US)
body,"The USGS Protected Areas Database of the United States (PAD-US) is the nation's inventory of protected areas, including public open space and voluntarily provided, private protected areas, identified as an A-16 National Geospatial Data Asset in the Cadastral Theme (http://www.fgdc.gov/ngda-reports/NGDA_Datasets.html). PAD-US is an ongoing project with several published versions of a spatial database of areas dedicated to the preservation of biological diversity, and other natural, recreational or cultural uses, managed for these purposes through legal or other effective means. The geodatabase maps and describes public open space and other protected areas. Most areas are public lands owned in fee; however, long-term easements, leases, and agreements or administrative designations documented in agency management plans may be included. The PAD-US database strives to be a complete “best available” inventory of protected areas (lands and waters) including data provided by managing agencies and organizations. The dataset is built in collaboration with several partners and data providers (http://gapanalysis.usgs.gov/padus/stewards/). See Supplemental Information Section of this metadata record for more information on partnerships and links to major partner organizations. As this dataset is a compilation of many data sets; data completeness, accuracy, and scale may vary. Federal and state data are generally complete, while local government and private protected area coverage is about 50% complete, and depends on data management capacity in the state. For completeness estimates by state: http://www.protectedlands.net/partners. As the federal and state data are reasonably complete; focus is shifting to completing the inventory of local gov and voluntarily provided, private protected areas. The PAD-US geodatabase contains over twenty-five attributes and four feature classes to support data management, queries, web mapping services and analyses: Marine Protected Areas (MPA), Fee, Easements and Combined. The data contained in the MPA Feature class are provided directly by the National Oceanic and Atmospheric Administration (NOAA) Marine Protected Areas Center (MPA, http://marineprotectedareas.noaa.gov ) tracking the National Marine Protected Areas System. The Easements feature class contains data provided directly from the National Conservation Easement Database (NCED, http://conservationeasement.us ) The MPA and Easement feature classes contain some attributes unique to the sole source databases tracking them (e.g. Easement Holder Name from NCED, Protection Level from NOAA MPA Inventory). The ""Combined"" feature class integrates all fee, easement and MPA features as the best available national inventory of protected areas in the standard PAD-US framework. In addition to geographic boundaries, PAD-US describes the protection mechanism category (e.g. fee, easement, designation, other), owner and managing agency, designation type, unit name, area, public access and state name in a suite of standardized fields. An informative set of references (i.e. Aggregator Source, GIS Source, GIS Source Date) and ""local"" or source data fields provide a transparent link between standardized PAD-US fields and information from authoritative data sources. The areas in PAD-US are also assigned conservation measures that assess management intent to permanently protect biological diversity: the nationally relevant ""GAP Status Code"" and global ""IUCN Category"" standard. A wealth of attributes facilitates a wide variety of data analyses and creates a context for data to be used at local, regional, state, national and international scales. More information about specific updates and changes to this PAD-US version can be found in the Data Quality Information section of this metadata record as well as on the PAD-US website, http://gapanalysis.usgs.gov/padus/data/history/.) Due to the completeness and complexity of these data, it is highly recommended to review the Supplemental Information Section of the metadata record as well as the Data Use Constraints, to better understand data partnerships as well as see tips and ideas of appropriate uses of the data and how to parse out the data that you are looking for. For more information regarding the PAD-US dataset please visit, http://gapanalysis.usgs.gov/padus/. To find more data resources as well as view example analysis performed using PAD-US data visit, http://gapanalysis.usgs.gov/padus/resources/. The PAD-US dataset and data standard are compiled and maintained by the USGS Gap Analysis Program, http://gapanalysis.usgs.gov/ . For more information about data standards and how the data are aggregated please review the “Standards and Methods Manual for PAD-US,” http://gapanalysis.usgs.gov/padus/data/standards/ ."
contacts,"emailLisajJohnson@boisestate.eduhours9:00 am - 5:00 pm MST, M-FprimaryLocationfaxPhone208-426-4370mailAddresscountryUSAline11910 University Drive - MS 1935cityBoisezip83725stateIdahostreetAddressofficePhone208-874-3102typePoint of ContactcontactTypepersonjobTitlePAD-US CoordinatornameLisa JohnsonorganizationdisplayTextUSGS Gap Analysis Program - Boise State University CooperatortypeOriginatorprimaryLocationmailAddressstreetAddressnameUS Geological Survey (USGS) Gap Analysis Program (GAP)organizationemailmasoncroft@boisestate.eduhours9:00 am - 5:00 pm MSTprimaryLocationmailAddresscountryUSAzip83725line1Boise State UniversitycityBoisestateIdaholine21910 University Drive - MS 1935streetAddressofficePhone208-301-8288typeMetadata ContactcontactTypepersonjobTitlePAD-US Technical SpecialistnameMason CroftorganizationdisplayTextUSGS Gap Analysis Program - Boise State University CooperatortypePublisherprimaryLocationmailAddressstreetAddressnameUSGS Gap Analysis Program (GAP)organizationemailsciencebase@usgs.govprimaryLocationmailAddresscountryUSAzip80225line1Denver Federal Center,cityDenverstateColoradoline2Building 810, Mail Stop 302streetAddressofficePhone1-888- 275-8747typeDistributorcontactTypeorganizationnameU.S. Geological Survey - ScienceBaseorganization"
purpose,"Purpose: The mission of the USGS Gap Analysis Program (GAP) is providing state, regional and national assessments of the conservation status of native vertebrate species and natural land cover types and facilitating the application of this information to land management activities. The PAD-US geodatabase is required to organize and assess the management status (i.e. apply GAP Status Codes) of elements of biodiversity protection. The goal of GAP is to 'keep common species common' by identifying species and plant communities not adequately represented in existing conservation lands. Common species are those not currently threatened with extinction. By identifying their habitats, gap analysis gives land managers and policy makers the information they need to make better-informed decisions when identifying priority areas for conservation. In cooperation with UNEP-World Conservation Monitoring Centre, GAP ensures PAD-US also supports global analyses to inform policy decisions by maintaining World Database for Protected Areas (WDPA) Site Codes and data for International Union for the Conservation of Nature (IUCN) categorized protected areas in the United States. GAP seeks to increase the efficiency and accuracy of PAD-US updates by leveraging resources in protected areas data aggregation and maintenance as described in ""A Map of the Future"", published following the PAD-US Design Project (July, 2009) available at: http://gapanalysis.usgs.gov/padus/vision/ with updates coming soon. While PAD-US was originally developed to support the GAP Mission stated above, the dataset is robust and has been expanded to support the conservation, recreation and public health communities as well. Additional applications become apparent over time. See the GAP Website http://gapanalysis.usgs.gov/padus/resources/ or the companion site http://protectedlands.net/uses for more information."
relatedItems,linkurlhttps://www.sciencebase.gov/catalog/itemLinks?itemId=56bba648e4b08d617f657960relrelated
id,56bba648e4b08d617f657960
webLinks,"typeOnline Linkurihttp://gapanalysis.usgs.gov/PADUS/hiddenFalserelrelatedtypeOnline Linkurihttp://gapanalysis.usgs.gov/PADUShiddenFalserelrelatedtypeOnline Linkurihttp://gapanalysis.usgs.gov/padus/data/hiddenFalserelrelatedtypeOnline Linkurihttps://doi.org/10.5066/F7G73BSZhiddenFalserelrelatedtypewebLinkurihttps://doi.org/10.5066/F7G73BSZtitleThe File downloads in a .zip formathiddenFalserelrelatedtypewebLinkurihttp://gapanalysis.usgs.gov/PADUStitleThe File downloads in a .zip formathiddenFalserelrelatedtypewebLinkurihttps://gis1.usgs.gov/arcgis/rest/services/PADUS1_4/Comb_Protected_Green/MapServertitlePAD-US 1.4 - All Protected Areas web service:hiddenFalserelrelatedtypewebLinkurihttps://gis1.usgs.gov/arcgis/rest/services/PADUS1_4/Fee_Protected_Gray/MapServertitlePAD-US 1.4 - Terrestrial Protected Areas - Gray web service:hiddenFalserelrelatedtypewebLinkurihttps://gis1.usgs.gov/arcgis/rest/services/PADUS1_4/Fee_Protected_Green/MapServertitlePAD-US 1.4 - Terrestrial Protected Areas - Green web service:hiddenFalserelrelatedtypewebLinkurihttps://gis1.usgs.gov/arcgis/rest/services/PADUS1_4/Category_Ease/MapServertitlePAD-US 1.4 - Easements web service:hiddenFalserelrelatedtypewebLinkurihttps://gis1.usgs.gov/arcgis/rest/services/PADUS1_4/Category_Fee_Ease_Oth_MPA/MapServertitlePAD-US 1.4 - Fee, Easement, Other, MPA web service:hiddenFalserelrelatedtypewebLinkurihttps://gis1.usgs.gov/arcgis/rest/services/PADUS1_4/Category_Fee_Ease_Oth/MapServertitlePAD-US 1.4 - Fee, Easement, Other web service:hiddenFalserelrelatedtypewebLinkurihttps://gis1.usgs.gov/arcgis/rest/services/PADUS1_4/Public/MapServertitlePAD-US 1.4 - Public Open Space web service:hiddenFalserelrelatedtypewebLinkurihttps://gis1.usgs.gov/arcgis/rest/services/PADUS1_4/Public_Private/MapServertitlePAD-US 1.4 - Public and Private Open Space web service:hiddenFalserelrelatedtypewebLinkurihttps://gis1.usgs.gov/arcgis/rest/services/PADUS1_4/Public_Access/MapServertitlePAD-US 1.4 - Public Access web service:hiddenFalserelrelatedtypewebLinkurihttps://gis1.usgs.gov/arcgis/rest/services/PADUS1_4/General_Agency_Level/MapServertitlePAD-US 1.4 - General Agency Level web service:hiddenFalserelrelatedtypewebLinkurihttps://gis1.usgs.gov/arcgis/rest/services/PADUS1_4/SimpleAgency/MapServertitlePAD-US 1.4 - Mid Agency Level web service:hiddenFalserelrelatedtypewebLinkurihttps://gis1.usgs.gov/arcgis/rest/services/PADUS1_4/Fine_Agency_Level/MapServertitlePAD-US 1.4 - Fine Agency Level web service:hiddenFalserelrelatedtypewebLinkurihttps://gis1.usgs.gov/arcgis/rest/services/PADUS1_4/FederalManagers/MapServertitlePAD-US 1.4 - Federal Management Agencies web service:hiddenFalserelrelatedtypewebLinkurihttps://gis1.usgs.gov/arcgis/rest/services/PADUS1_4/Fed_Tribal_Other/MapServertitlePAD-US 1.4 - Federal, Tribal, Other web service:hiddenFalserelrelatedtypewebLinkurihttps://gis1.usgs.gov/arcgis/rest/services/PADUS1_4/DOI_Tribal_Other/MapServertitlePAD-US 1.4 - DOI, Other Federal, Tribal web service:hiddenFalserelrelatedtypewebLinkurihttps://gis1.usgs.gov/arcgis/rest/services/PADUS1_4/SimpleDesignationType/MapServertitlePAD-US 1.4 - Simple Designation Type web service:hiddenFalserelrelatedtypewebLinkurihttps://gis1.usgs.gov/arcgis/rest/services/PADUS1_4/DesignationType/MapServertitlePAD-US 1.4 - Detailed Designation Type web service:hiddenFalserelrelatedtypewebLinkurihttps://gis1.usgs.gov/arcgis/rest/services/PADUS1_4/Protected_Areas_by_Manager/MapServertitlePAD-US 1.4 - Protected Areas by Manager web service:hiddenFalserelrelatedtypewebLinkurihttps://gis1.usgs.gov/arcgis/rest/services/PADUS1_4/Protected_Biodiversity_Areas/MapServertitlePAD-US 1.4 - Protected Biodiversity Areas web service:hiddenFalserelrelatedtypewebLinkurihttps://gis1.usgs.gov/arcgis/rest/services/PADUS1_4/Protected_Multiple_Use_Areas/MapServertitlePAD-US 1.4 - Protected Multiple Use Areas web service:hiddenFalserelrelatedtypewebLinkurihttps://gis1.usgs.gov/arcgis/rest/services/PADUS1_4/GAP_Status_Code/MapServertitlePAD-US 1.4 - GAP Status Code web service:hiddenFalserelrelatedtypewebLinkurihttps://gis1.usgs.gov/arcgis/rest/services/PADUS1_4/IUCNCategories/MapServertitlePAD-US 1.4 - IUCN Category web service:hiddenFalserelrelatedtypewebLinkurihttps://gis1.usgs.gov/arcgis/rest/services/PADUS1_4/IUCNCategories_OtherConservationAreas/MapServertitlePAD-US 1.4 - IUCN Category and Other Conservation Areas web service:hiddenFalserelrelatedtypeLabelMapping ApplicationhiddenFalserelrelatedtypemapappurihttps://maps.usgs.gov/padus/titlePAD-US Map Viewer"

0,1
url,https://www.sciencebase.gov/catalog/item/56bba648e4b08d617f657960
rel,self

0,1
email,LisajJohnson@boisestate.edu
hours,"9:00 am - 5:00 pm MST, M-F"
primaryLocation,faxPhone208-426-4370mailAddresscountryUSAline11910 University Drive - MS 1935cityBoisezip83725stateIdahostreetAddressofficePhone208-874-3102
type,Point of Contact
contactType,person
jobTitle,PAD-US Coordinator
name,Lisa Johnson
organization,displayTextUSGS Gap Analysis Program - Boise State University Cooperator

0,1
faxPhone,208-426-4370
mailAddress,countryUSAline11910 University Drive - MS 1935cityBoisezip83725stateIdaho
streetAddress,
officePhone,208-874-3102

0,1
country,USA
line1,1910 University Drive - MS 1935
city,Boise
zip,83725
state,Idaho

0,1
displayText,USGS Gap Analysis Program - Boise State University Cooperator

0,1
type,Originator
primaryLocation,mailAddressstreetAddress
name,US Geological Survey (USGS) Gap Analysis Program (GAP)
organization,

0,1
mailAddress,
streetAddress,

0,1
email,masoncroft@boisestate.edu
hours,9:00 am - 5:00 pm MST
primaryLocation,mailAddresscountryUSAzip83725line1Boise State UniversitycityBoisestateIdaholine21910 University Drive - MS 1935streetAddressofficePhone208-301-8288
type,Metadata Contact
contactType,person
jobTitle,PAD-US Technical Specialist
name,Mason Croft
organization,displayTextUSGS Gap Analysis Program - Boise State University Cooperator

0,1
mailAddress,countryUSAzip83725line1Boise State UniversitycityBoisestateIdaholine21910 University Drive - MS 1935
streetAddress,
officePhone,208-301-8288

0,1
country,USA
zip,83725
line1,Boise State University
city,Boise
state,Idaho
line2,1910 University Drive - MS 1935

0,1
displayText,USGS Gap Analysis Program - Boise State University Cooperator

0,1
type,Publisher
primaryLocation,mailAddressstreetAddress
name,USGS Gap Analysis Program (GAP)
organization,

0,1
mailAddress,
streetAddress,

0,1
email,sciencebase@usgs.gov
primaryLocation,"mailAddresscountryUSAzip80225line1Denver Federal Center,cityDenverstateColoradoline2Building 810, Mail Stop 302streetAddressofficePhone1-888- 275-8747"
type,Distributor
contactType,organization
name,U.S. Geological Survey - ScienceBase
organization,

0,1
mailAddress,"countryUSAzip80225line1Denver Federal Center,cityDenverstateColoradoline2Building 810, Mail Stop 302"
streetAddress,
officePhone,1-888- 275-8747

0,1
country,USA
zip,80225
line1,"Denver Federal Center,"
city,Denver
state,Colorado
line2,"Building 810, Mail Stop 302"

0,1
link,urlhttps://www.sciencebase.gov/catalog/itemLinks?itemId=56bba648e4b08d617f657960relrelated

0,1
url,https://www.sciencebase.gov/catalog/itemLinks?itemId=56bba648e4b08d617f657960
rel,related

0,1
type,Online Link
uri,http://gapanalysis.usgs.gov/PADUS/
hidden,False
rel,related

0,1
type,Online Link
uri,http://gapanalysis.usgs.gov/PADUS
hidden,False
rel,related

0,1
type,Online Link
uri,http://gapanalysis.usgs.gov/padus/data/
hidden,False
rel,related

0,1
type,Online Link
uri,https://doi.org/10.5066/F7G73BSZ
hidden,False
rel,related

0,1
type,webLink
uri,https://doi.org/10.5066/F7G73BSZ
title,The File downloads in a .zip format
hidden,False
rel,related

0,1
type,webLink
uri,http://gapanalysis.usgs.gov/PADUS
title,The File downloads in a .zip format
hidden,False
rel,related

0,1
type,webLink
uri,https://gis1.usgs.gov/arcgis/rest/services/PADUS1_4/Comb_Protected_Green/MapServer
title,PAD-US 1.4 - All Protected Areas web service:
hidden,False
rel,related

0,1
type,webLink
uri,https://gis1.usgs.gov/arcgis/rest/services/PADUS1_4/Fee_Protected_Gray/MapServer
title,PAD-US 1.4 - Terrestrial Protected Areas - Gray web service:
hidden,False
rel,related

0,1
type,webLink
uri,https://gis1.usgs.gov/arcgis/rest/services/PADUS1_4/Fee_Protected_Green/MapServer
title,PAD-US 1.4 - Terrestrial Protected Areas - Green web service:
hidden,False
rel,related

0,1
type,webLink
uri,https://gis1.usgs.gov/arcgis/rest/services/PADUS1_4/Category_Ease/MapServer
title,PAD-US 1.4 - Easements web service:
hidden,False
rel,related

0,1
type,webLink
uri,https://gis1.usgs.gov/arcgis/rest/services/PADUS1_4/Category_Fee_Ease_Oth_MPA/MapServer
title,"PAD-US 1.4 - Fee, Easement, Other, MPA web service:"
hidden,False
rel,related

0,1
type,webLink
uri,https://gis1.usgs.gov/arcgis/rest/services/PADUS1_4/Category_Fee_Ease_Oth/MapServer
title,"PAD-US 1.4 - Fee, Easement, Other web service:"
hidden,False
rel,related

0,1
type,webLink
uri,https://gis1.usgs.gov/arcgis/rest/services/PADUS1_4/Public/MapServer
title,PAD-US 1.4 - Public Open Space web service:
hidden,False
rel,related

0,1
type,webLink
uri,https://gis1.usgs.gov/arcgis/rest/services/PADUS1_4/Public_Private/MapServer
title,PAD-US 1.4 - Public and Private Open Space web service:
hidden,False
rel,related

0,1
type,webLink
uri,https://gis1.usgs.gov/arcgis/rest/services/PADUS1_4/Public_Access/MapServer
title,PAD-US 1.4 - Public Access web service:
hidden,False
rel,related

0,1
type,webLink
uri,https://gis1.usgs.gov/arcgis/rest/services/PADUS1_4/General_Agency_Level/MapServer
title,PAD-US 1.4 - General Agency Level web service:
hidden,False
rel,related

0,1
type,webLink
uri,https://gis1.usgs.gov/arcgis/rest/services/PADUS1_4/SimpleAgency/MapServer
title,PAD-US 1.4 - Mid Agency Level web service:
hidden,False
rel,related

0,1
type,webLink
uri,https://gis1.usgs.gov/arcgis/rest/services/PADUS1_4/Fine_Agency_Level/MapServer
title,PAD-US 1.4 - Fine Agency Level web service:
hidden,False
rel,related

0,1
type,webLink
uri,https://gis1.usgs.gov/arcgis/rest/services/PADUS1_4/FederalManagers/MapServer
title,PAD-US 1.4 - Federal Management Agencies web service:
hidden,False
rel,related

0,1
type,webLink
uri,https://gis1.usgs.gov/arcgis/rest/services/PADUS1_4/Fed_Tribal_Other/MapServer
title,"PAD-US 1.4 - Federal, Tribal, Other web service:"
hidden,False
rel,related

0,1
type,webLink
uri,https://gis1.usgs.gov/arcgis/rest/services/PADUS1_4/DOI_Tribal_Other/MapServer
title,"PAD-US 1.4 - DOI, Other Federal, Tribal web service:"
hidden,False
rel,related

0,1
type,webLink
uri,https://gis1.usgs.gov/arcgis/rest/services/PADUS1_4/SimpleDesignationType/MapServer
title,PAD-US 1.4 - Simple Designation Type web service:
hidden,False
rel,related

0,1
type,webLink
uri,https://gis1.usgs.gov/arcgis/rest/services/PADUS1_4/DesignationType/MapServer
title,PAD-US 1.4 - Detailed Designation Type web service:
hidden,False
rel,related

0,1
type,webLink
uri,https://gis1.usgs.gov/arcgis/rest/services/PADUS1_4/Protected_Areas_by_Manager/MapServer
title,PAD-US 1.4 - Protected Areas by Manager web service:
hidden,False
rel,related

0,1
type,webLink
uri,https://gis1.usgs.gov/arcgis/rest/services/PADUS1_4/Protected_Biodiversity_Areas/MapServer
title,PAD-US 1.4 - Protected Biodiversity Areas web service:
hidden,False
rel,related

0,1
type,webLink
uri,https://gis1.usgs.gov/arcgis/rest/services/PADUS1_4/Protected_Multiple_Use_Areas/MapServer
title,PAD-US 1.4 - Protected Multiple Use Areas web service:
hidden,False
rel,related

0,1
type,webLink
uri,https://gis1.usgs.gov/arcgis/rest/services/PADUS1_4/GAP_Status_Code/MapServer
title,PAD-US 1.4 - GAP Status Code web service:
hidden,False
rel,related

0,1
type,webLink
uri,https://gis1.usgs.gov/arcgis/rest/services/PADUS1_4/IUCNCategories/MapServer
title,PAD-US 1.4 - IUCN Category web service:
hidden,False
rel,related

0,1
type,webLink
uri,https://gis1.usgs.gov/arcgis/rest/services/PADUS1_4/IUCNCategories_OtherConservationAreas/MapServer
title,PAD-US 1.4 - IUCN Category and Other Conservation Areas web service:
hidden,False
rel,related

0,1
typeLabel,Mapping Application
hidden,False
rel,related
type,mapapp
uri,https://maps.usgs.gov/padus/
title,PAD-US Map Viewer


### -- Evaluation
The collection item for PAD-US, which forms what is probably the best basis for the product, seems to be in pretty good shape. We can pull information from it using the ScienceBase API and display it anywhere.

# Acquire Stage

In [40]:
getOpenIssuesForSection("DMP Acquisition")

Document PAD-US source data in ScienceBase
We can use the spreadsheet of PAD-US source data (issue #4) to create much of the information needed to document PAD-US sources and then further flesh the items out as needed with additional details.
https://github.com/usgs-bcb/bcb-dm/issues/27
----
Standardize documentation of data provider names and contact information in PAD-US source data
As we get source data documented as discrete catalog items, we can work on standardizing names and contact information for source providers to provide a usable metadata resource for the system as a whole.
https://github.com/usgs-bcb/bcb-dm/issues/23
----


## Data Acquisition Factors
What factor(s) influence the acquisition of new data by your product? 

* Priorities determined by CSAS&L product owner/team – GAP (PAD-US) determines what data are acquired for inclusion in PAD-US, in part by available opportunities through partnerships.  State Data Steward Survey Project results are due Sept 30th of the Fiscal Year.  That content, and the revised PAD-US “Partners” webpage (along with companion site at www.ProtectedLands.net) help document and manage update priorities.
* Priorities determined by CSAS&L Program
* Priorities determined by USGS
* Priorities determined by existing opportunities with major collaborators/partners
* Priorities determined by major stakeholders
* Data acquisition may be opportunistic (for example, the Trust for Public Land (TPL) ParkScore project helps PAD-US ‘catch up’ local gov inventory through data deliveries expected between 2017 - 2020)


### -- Evaluation
It's not entirely clear why a list of how priorities are determined is the focus of a section on data acquisition factors. Other things that could be considered here might include:

* Update frequency on the part of PAD-US data sources (something that could be made explicit with future dates on ScienceBase Items documenting PAD-US sources and then used in laying out update schedules for the product)
* Identification of issues in the data by users/stakeholders needing correction (could be flagged at the metadata level for a source dataset initially or in the data themselves, allowing for dynamic reporting in something like this notebook or for work scheduling)  
* Identification of new requirements for the system requiring evolution of the data model to accommodate new concepts/properties (depending on how we structured this work, new properties could be added to the data model and then reported on by the system in terms of process toward completion)

If a focus of this section is on priority setting, we could put a mechanism in place to capture this information in a structured way. If we begin documenting PAD-US sources as discrete items in ScienceBase, we could make this type of reasoning explicit within our data management system. Some of the bullets from the DMP here are a little bit vague and could be improved with some thinking about how we designate the positioning of data sources in the priority queue. We could likely use the expedient of tags from a controlled vocabulary on items in a PAD-US sources collection to indicate how decisions are being made. In that way, we could then call up a report at any time to dynamically display prioritized lists of sources we are acquiring for PAD-US.

## Data Providers
Who are the data providers for the product?

* USGS
* Other DOI agencies (NPS, FWS, BLM, BIA, USBR, BOEM)
* Other federal agencies outside DOI (USFS, NRCS, DoD, U.S. Army COE, NOAA)
* State
* local agencies
* NGOs (Nature Conservancy, Trust for Public Land, Ducks Unlimited)
* Private sector
* Academic/research institutions
* Citizen science initiatives
* Personal/individual contributors

In [15]:
# Currently, we can somewhat get at this issue by pulling the GIS_Src attribute out of one of the PAD-US data services currently online
padGISSources = requests.get("https://gis1.usgs.gov/arcgis/rest/services/PADUS1_4/Protected_Areas_by_Manager/MapServer/0/query?where=0%3D0&outFields=GIS_Src&returnGeometry=false&returnTrueCurves=false&returnIdsOnly=false&returnCountOnly=false&returnZ=false&returnM=false&returnDistinctValues=true&f=pjson").json()

for feature in padGISSources["features"]:
    print (feature["attributes"]["GIS_Src"])

Sumter_Co_Parcels.shp
Ducks Unlimited - digitized by eye from NLT map on website
The Nature Conservancy -Texas Chapter
COMBINATION OF DIGITAL BOUNDARIES PROVIDED BY ST. JOHNS RIVER WATER MANAGEMENT DISTRICT. 05/2012, A*
SCANNED TAX MAP
Chesapeake Bay Foundation
DEED + SURVEY
New York State Office of Parks Recreation and Historic Preservation
TPWD_LWRCRP2012.shp/Harris, County of
Town of York Digital Parcel Data
Carteret_Parcels2011.shp
PPL
ScenicGalvestonINC_parcels.shp/GalvestonCAD
Broomfield County Open Space and Trails
New Jersey Department of Environmental Protection
lee_parcels_2013_03_05.shp
Pima County D.O.T. Technical Services
Neil Jordan - The Nature Conservancy SC Chapter
8.4, 7.12
Unita_County_Bear_River_Park_2012.gdb
CHAGRIN VALLEY ENGINEERING
West Virginia Agricultural Land Protection Authority (Matt Monroe)
GPS, DXF
Survey, York Parcel Data, Color DOQ
DIGITAL BOUNDARIES PROVIDED BY SOUTH FLORIDA WATER MANAGEMENT DISTRICT. 04/2008 AND 07/2008
New Hampshire Water Supply Lan

In [16]:
# And/or/ Agg_Src might be another useful attribute, but we need to dig more in to understand where this information comes from and how it is maintained
padAggregatorSources = requests.get("https://gis1.usgs.gov/arcgis/rest/services/PADUS1_4/Protected_Areas_by_Manager/MapServer/0/query?where=0%3D0&outFields=Agg_Src&returnGeometry=false&returnTrueCurves=false&returnIdsOnly=false&returnCountOnly=false&returnZ=false&returnM=false&returnDistinctValues=true&f=pjson").json()

for feature in padAggregatorSources["features"]:
    print (feature["attributes"]["Agg_Src"])

PADUS_State_Parks_and_Historic_Sites_2012.gdb
GAP_PADUS1_4Fee_USFS_ALP_S_USA.BasicOwnership.gdb/BasicOwnership
GAP_PADUS1_4Designation_FWSSpecialDesignation_preprocess
GAP_PADUS1_4Designation_USFS_ALP_S_USA.WildScenicRiver
Alabama Department of Conservation and Natural Resources (ADCNR)_WMAOutlines2013.shp
GAP_PADUS1_4Easements_NRCS_easement_a_extract
The Trust for Public Land
PADUS_Campbell_County_2012.gdb
NPS_Lands_nps_tracts.shp
PADUS_City_of_Sheridan_2012.gdb
AGRC_SGID10_Archive.CADASTRE.PADUS_Submission2012.sde
PADUS_WGFD_2012.gdb
CALMIT
GAP_PADUS1_4Designation_Proclamation_NPS_Boundary.shp
PADUS_City_of_Casper_2012.gdb
TPL_Conservation_Almanac_State_Template/Conservation_Almanac_Database_US_Nov2011.gdb
USGS_Pacific_Protected_Areas_Database
UGA_NARSAL_GAConservationLands2012.gdb
GAP_PADUS1_4Designation_BLM_NOC_WSR
MFC_SchoolTrustLands2012.shp
Alabama Department of Conservation and Natural Resources (ADCNR)_PublicFishingLakes2013.shp
Missouri Resource Assessment Partnership (MoRAP)

### -- Evaluation

The current set of information from the final database doesn't really get at the question of who is providing the data. There appears to be a mix of information in this property and in the underlying spreadsheet where the information comes from. Some of this appears to be internal notes or a mix of information that should perhaps be someplace else.

## Data Agreements

All data submitted for PAD-US must be suitable for distribution in the public domain. 

The following statement is included in program announcements soliciting proposals from state and non-profit data stewards and clearly states the public nature of the data provided for inclusion in PAD-US: 

*Dissemination of results: All award recipients shall complete proposed work within the required time frame, results should be published in a peer-reviewed form, and all data resulting from the project shall be released to the public domain in a timely fashion. The Government may publish, reproduce, and use all technical data developed as a result of this award in any manner and for any purpose, without limitation, and may authorize others to do the same.  Data generated as a part of work funded under this program is not subject to a proprietary period of exclusive data access.  Any data generated must be made available to the USGS as soon as it is available. The USGS reserves a royalty-free, nonexclusive and irrevocable license to reproduce, publish, or otherwise use, and to authorize others to use the data for Government purposes. Any project funded in whole or part with funds obtained under this program shall fall under this clause.*

### -- Evaluation
This comment is a little beyond data management into program policy. There have been a few sidebar discussions leading up to our working group session indicating that some data providers, including Federal agencies, have indicated that their data are not public domain or that we have made agreements not to post original data we have received from said agencies. This practice (still to be verified) seems to be at odds with this statement of the foundational agreement for sharing data with USGS. I think we need to get to the bottom of what our posture is on this and make sure we are working with a consistent policy that is in keeping with USGS direction.

## Data Types
How are incoming data received? 

* Geodatabases
* Shapefiles
* Excel spreadsheets
* Google spreadsheets
* Word documents

### -- Evaluation
Verifying the plan and reporting reality is another thing that will be solved readily if we get PAD-US sources cataloged. We can likely get at some of this now by looking at the same GIS_Src listing from above where we can see "gdb" and "shp" files along with some other things that are not listed.

It is important to note that it is not merely a matter of checking a box for the DMP to be able to explicitly report on data types. As we look to develop semi-automated workflows for processing incoming data, we can work with classes of those processors based on the basic mechanics of how to read and work with different kinds of data.

What information coming into PAD-US is contained in Excel spreadsheets, Google spreadsheets, and Word documents? Obviously not spatial information. Attribution associated with the designated areas, metadata?

## Method of Accepting Data
How do you accept incoming data? 

* Email
* DropBox (Boise State University is using DropBox, not USGS)
* Google Drive
* FTP site

### -- Evaluation
This text from the DMP appears to indicate that all data are essentially sent to the PAD-US data manager as files that are then loaded up someplace for processing. All source material needs to make some type of "online debut" associated with the ScienceBase Items cataloging the data sources. This could likely take a number of different forms:

* Link(s) to reliable point where data can be accessed as some form of online service or API from the provider
* Link(s) to reliable point where data can be downloaded from a provider's repository
* File cached with the ScienceBase Items or stored online in storage that ScienceBase "talks to" (e.g., Amazon S3)

We need source data to be online a number of reasons:

1. It's important for the reproducibility principle that we be able to link users back to exactly where data records in our system came from so that they can understand and trust the final product we are presenting
2. We need to move toward where we are fully processing source data with code operating against the actual source data from end to end, and we need those source data to be "live" somewhere that the code can access for processing
3. We may need to reprocess source data through time for any number of reasons as we continue to evolve the data model

## Storage of Acquired Data
Incoming data are stored locally on PAD-US Development Team computers and backed up to servers at Boise State University.  Local computer back-ups are also stored on site.

### -- Evaluation
As stated in the last evaluation section, source (acquiried data) needs to have a reliable online home, likely with the ScienceBase Items cataloging their existence. Storing the data in ScienceBase, now a USGS Trusted Digital Repository, will help ensure their availability, dealing with all the issues of a Certified and Accredited USGS information system.

## Use of Data Standards
Many PAD-US data stewards provide data in some state of compliance with the [PAD-US Standards Manual](http://gapanalysis.usgs.gov/padus/data/standards/) (in press).  Others require translation into the PAD-US schema following standards by the Development Team.

### -- Evaluation
As we put tools in place for some form of regularized processing of PAD-US data, we should be able to report on this dynamic of alignment with the PAD-US schema. We could show summary reports of the datasets found to be in complete alignment and those that require specific actions to bring them into alignment. This type of report could be useful feedback to data providers. We're doing similar things in other data integration activities like the work to assemble State Species of Greatest Conservation Need where we report to the states on submitted species names and their alignment with taxonomic authorities.

## Metadata for Acquired Data
FGDC compliant metadata are a component of the PAD-US State Data Steward Network projects, periodically supported by USGS.  In addition, a Final Report (template provided ) is required with the delivery of state geodatabase updates.  Reports are thoroughly reviewed.

### -- Evaluation
These artifacts, formal metadata documents and final reports, should be housed with the PAD-US source data items in ScienceBase. They provide potentially valuable documentation on the provenance for the records that end up in our database. Care should be taken with the FGDC metadata when uploading to the items to either a) ensure that it documents the item explicitly as source material for PAD-US or b) it is not processed dynamically to generate the information for the ScienceBase Item if it does not. The ScienceBase Items that describe PAD-US sources should be explicit about what they are so as not to be confused (when discovered via public search, etc.) as somehow duplicating another item that is the provider's data. For instance, if someone runs a Google search for "nps boundaries" and comes up with the PAD-US source item we have documented for the National Park Service boundaries we integrate, the information on that item should be abundantly clear about what it is and a link should be provided to the provider's system to get the original.

What does the review of final reports consist of? Does the review itself generate any useful information that should be recorded in some way as part of the data management process?

## Checking In Datasets
When data sets are received they are saved in a folder for the pertinent PAD-US version by data provider and submission type/date. Data sets are then checked by a QA/QC script that looks for adherence to the PAD-US Standards Manual, including common data issues and checks for required information, then outputs a .txt file report that is shared with the data provider.

### -- Evaluation
The "folder" should be replaced by the ScienceBase Item containing the source data. The output of scripted processes checking alignment with the PAD-US schema should be stored on those items as well. As we work toward continuous data integration processes, the versioning dynamic (post 2.0) will change. Ideally, we would have file object versioning actually built into the ScienceBase Repository, but that is not currently the case. We might be able to accomplish versioning by leveraging the ScienceBase connection to Amazon S3, but there is no built-in functionality in ScienceBase that exposes that [feature](http://docs.aws.amazon.com/AmazonS3/latest/dev/Versioning.html) of S3. We would then be processing the latest version of a submitted dataset and pointing back to that dataset as source in the provenance of final records in PAD-US at any given state. We could also do what we are doing in other data systems that now have automated or semi-automated code-based processing into integrated data systems (SGCN and OBIS) where we use titles on files "attached" to ScienceBase Items to indicate the specific file object to be processed as source material vs. other files that might be on that item. At any rate, we need to daylight this part of the process, and we will then be able to report on and evaluate that process via code.

## Communication Process
Do you have an established process for communicating the status and/or completeness of the acquisition process with data providers and/or your product team? 

* The PAD-US Development Team receive data submissions and final reports in draft form 2-3 months before end FY project deadlines (Sept 30th) from PAD-US State Data Steward award recipients.
* Draft geodatabase and final reports are saved in a “Draft” folder, within a folder for each state, saved on PAD-US Development Team Desktops, backup external hard drive, with periodic updates sent to BSU back up server.
* Written comments / edits are sent via email with a copy to the PAD-US Development Team, all correspondence saved in a folder for each state. 
* Final geodatabase deliveries and final reports are sent to the PAD-US Development Team and saved in a folder for each state, with back-ups sent to the BSU server.
* Receipt of data submission is acknowledged via email by the PAD-US Coordinator (saved in state email folder) with an expected date of PAD-US publication provided.  
* The PAD-US Development Team reviews final deliverables for incorporation into PAD-US, often requiring additional communication with the data steward.
* ProtectedLands.net (developed by GAP Cooperator GreenInfo Network) is a resource for more information about PAD-US, including a Table listing all State Data Steward contacts, a map of the year last updated and an estimate of inventory completeness. 
* A summary of all completed deliverables, or approved extensions, is provided to the USGS Contract Officer and / or PAD-US Data Manager at the end of each FY (i.e. Project cycle).
* All data providers receive announcements of PAD-US publications via email from the PAD-US Coordinator.

Communicating to users (outside the PAD-US Development Team and data providers):
* The PAD-US Development Team provides USGS with content related to PAD-US updates for USGS Technical Announcements and Fact Sheets.
* The PAD-US Development Team hosts PAD-US Overview webinars to increase awareness and offer technical support.

### -- Evaluation
There is a lot to be done here, but once we get the sources documented with persistent identifiers, we can begin building a communication protocol around that resource. Aspects of the system that are currently external (ProtectedLands.net listing of data stewards) needs to move to a USGS core process.

# Process Stage

In [41]:
getOpenIssuesForSection("DMP Processing")

## Processing Data
Do you work with received data in the format provided, or do you process the data in some way (e.g. convert or import it to some other format)?
If data are not submitted in the PAD-US Schema, following PAD-US Standards, it is converted following QA/QC review.  A common processing requirement is the intersection of the official PAD-US State boundary file (Census BAS States (or equivalent) with data submission to attribute “State Name”.

### -- Evaluation
Overall, the evidence/verification of processing steps should be available as code that does QA/QC work or takes action to align data with the target PAD-US schema. For instance, if there is a chunk of code that does the spatial operation to find the state boundaries that a given protected area designation lies within, intersects with, or whatever the method, that source code should be referenceable online (e.g., in something like a GitHub repo). We should be able to that code as part of the provenance for what happened to data on their way into the final system. We can then delineate precisely what those processing steps are in a notebook like this or any number of reports, build on and improve them over time, and provide full transparency into the process of producing these data.

## Processing Workflow

PAD-US is developed in partnership with many organizations, including coordination groups at the federal level, lead organizations for each state, and a number of national and other nongovernmental organizations whose work is closely related to PAD-US. PAD-US is developed in two main processes: lands integration among federal agencies (and some national nonprofits); and state by state inventories that are rolled into the national PAD-US structure by GAP.

### -- Evaluation
To a certain extent, we should work to deal separately with the workflows that are within our management purview and those that happen before data come to be processed by USGS into PAD-US. Processing workflows within our control are from the point that source data make their "debut" in our source data repository. From that point, we need to work toward a fully transparent and code-driven workflow that processes data in the most efficient ways possible into the final product. Before that point, we have a variety of partnership development activities to cultivate data providers and help them both improve their own internal processes and get us a better input product. There is a feedback loop between these stages of the process through QA/QC and other processing where we share information to help make improvements over time.

## GAP Code Source Annotation
The GAPCdSrc is an attribute in the final PAD-US schema that contains specific annotation on how the GAP Code was assigned to a record. It is part of the "local knowledge" that is important to how PAD-US functions. The following code block shows the unique values for this attribute.

In [42]:
# Unique GAPCdSrc values
padGAPCodeSources = requests.get("https://gis1.usgs.gov/arcgis/rest/services/PADUS1_4/Protected_Areas_by_Manager/MapServer/0/query?where=0%3D0&outFields=GAPCdSrc&returnGeometry=false&returnTrueCurves=false&returnIdsOnly=false&returnCountOnly=false&returnZ=false&returnM=false&returnDistinctValues=true&f=pjson").json()

for feature in padGAPCodeSources["features"]:
    print (feature["attributes"]["GAPCdSrc"])

Florida Natural Areas Inventory
2013
DCBR-GAP
ADCNR
DCNBR-GAP
KSNPC
IDFG
United States Fish and Wildlife Service
VA-DCR-DNH
DcNR-GAP
FL DEP, Div. of Recreation and Parks
GAP - NHNM
GAP - NPS
GAP-NCNHP
Virginia Heritage Program
ALDCNR
WADFW_WADNR
Interagency Wilderness Steering Committee
GAP - NOAA
GAP - FWS
CNHP - following GAP dichotomous key
GreenInfo Network
GAP - AKNHP
NCED
GAP - USFS
WDFW-SPRC
GAP - TNC
Nevada Natural Heritage Program
GAP - NV State Lands
ORBIC
TNC
WDFW
The Nature Conservancy - Eastern Resource Office
Pima County GIS Department
ONHI
DCNR-GAP
Portland P&R
GAP
GAP - Default
DCNR_GAP
dcnr-gap
Revised Default per 11/10/2010 email between Lisa Duarte and Michelle Fink


# Preserve Stage

In [33]:
getOpenIssuesForSection("DMP Preservation")

Figure out disposition of legacy PAD-US files on Amazon storage
The Protected Areas notebook shows a listing of files that we need to figure out how to handle from older PAD-US work. If we need to keep these, we should catalog them and store them in ScienceBase for greater sustainability and ease of reference.
https://github.com/usgs-bcb/bcb-dm/issues/3
----


## Preservation of Final Data
Where will the final data packages be preserved?

The data are stored in ScienceBase. ScienceBase maintains multiple backups of data.

In [17]:
# This code block shows older PAD-US files stored in an Amazon S3 bucket that was used for some time to store and distribute GAP data of various kinds.

import mmap
import re

pattern = re.compile(rb'(\.\W+)?([^.]?PADUS[^.]*?\.)')

with open("data/usgs-gap-data-ls.txt", "r") as gapfiles:
    with mmap.mmap(gapfiles.fileno(), 0, access=mmap.ACCESS_READ) as m:
        for match in pattern.findall(m):
            print(match[1].replace(b'\n', b' '))

b' PADUS/ByLCC/PAD-US_LCC_00.'
b' PADUS/ByLCC/PAD-US_LCC_01.'
b' PADUS/ByLCC/PAD-US_LCC_02.'
b' PADUS/ByLCC/PAD-US_LCC_03.'
b' PADUS/ByLCC/PAD-US_LCC_04.'
b' PADUS/ByLCC/PAD-US_LCC_05.'
b' PADUS/ByLCC/PAD-US_LCC_06.'
b' PADUS/ByLCC/PAD-US_LCC_07.'
b' PADUS/ByLCC/PAD-US_LCC_08.'
b' PADUS/ByLCC/PAD-US_LCC_09.'
b' PADUS/ByLCC/PAD-US_LCC_10.'
b' PADUS/ByLCC/PAD-US_LCC_11.'
b' PADUS/ByLCC/PAD-US_LCC_12.'
b' PADUS/ByLCC/PAD-US_LCC_13.'
b' PADUS/ByLCC/PAD-US_LCC_14.'
b' PADUS/ByLCC/PAD-US_LCC_15.'
b' PADUS/ByLCC/PAD-US_LCC_16.'
b' PADUS/ByLCC/PAD-US_LCC_17.'
b' PADUS/ByLCC/PAD-US_LCC_19.'
b' PADUS/ByLCC/PAD-US_LCC_20.'
b' PADUS/ByLCC/PAD-US_LCC_21.'
b' PADUS/ByLCC/PADUS1_2LCC/PAD-US_LCC_00.'
b' PADUS/ByLCC/PADUS1_2LCC/PAD-US_LCC_01.'
b' PADUS/ByLCC/PADUS1_2LCC/PAD-US_LCC_02.'
b' PADUS/ByLCC/PADUS1_2LCC/PAD-US_LCC_03.'
b' PADUS/ByLCC/PADUS1_2LCC/PAD-US_LCC_04.'
b' PADUS/ByLCC/PADUS1_2LCC/PAD-US_LCC_05.'
b' PADUS/ByLCC/PADUS1_2LCC/PAD-US_LCC_06.'
b' PADUS/ByLCC/PADUS1_2LCC/PAD-US_LCC_07.'
b' PA

# Publish/Share Stage

## Distribution Platforms
What platform(s) will be used to distribute the data?
PAD-US 1.4 is available from this ScienceBase record: https://www.sciencebase.gov/catalog/item/56bba648e4b08d617f657960 It is also available from the PAD-US Data Download page http://gapanalysis.usgs.gov/padus/data/download/ - the download options on this page are coming from the ScienceBase record. A map viewer has been developed. This is the landing page for the viewer http://gapanalysis.usgs.gov/padus/viewer/ and the final viewer has this URL: https://maps.usgs.gov/padus/ 