The FWS work plan species originated from a published PDF file, but then a number of things have gone on over time to assemble information and assistance that USGS can provide from across Mission Areas and Science Centers. Much of this has been put together into one core spreadsheet that we are treating here as our master source (sources/Prelisting Science USGS Master_19Mar2018.xlsx). The worksheets in the spreadsheet all contain various kinds of information that we work with elsewhere in these notebooks. The main listing we refer to is in the "FWS 7 Year Workplan Species" worksheet. It has been enhanced a bit over time with an additional field with species guilds used for organizational purposes.

This notebook digests the spreadsheet a little bit to produce a data structure that is more conducive to working with in Python throughout this system.

In [1]:
import pandas as pd
import numpy as np
import bispy
from IPython.display import display
import json

bis_utils = bispy.bis.Utils()

import pickle

In [6]:
# Open up the cache of ECOS info for use
with open("../cache/ecos.json", "r") as f:
    cached_ecos_data = json.loads(f.read())
    f.close()

# Quick function to retrieve the ECOS Link (Search URL recorded in processing metadata) for cached ECOS scraped records
def ecos_bits(name, return_var="ECOS Link"):
    ecos_scraped_record = next((r for r in cached_ecos_data if r["ecos_species_summary"]["Scientific Name"] == name), None)
    if ecos_scraped_record is None:
        return_data = None
    else:
        if return_var == "ECOS Link":
            return_data = ecos_scraped_record["processing_metadata"]["api"]
        elif return_var == "ITIS TSN":
            return_data = ecos_scraped_record["ecos_species_summary"]["ITIS TSN"]

    return return_data

spp_ecos_links = pd.read_excel(
    "../sources/AdditionalSourceData.xlsx",
    sheet_name="Extracted Species ECOS Links"
)

def lookup_name(name):
    return spp_ecos_links.loc[spp_ecos_links['Scientific Name'] == name, 'Lookup Name'].iloc[0]


In [7]:
spp_list = pd.read_excel("../sources/Prelisting Science USGS Master_19Mar2018.xlsx", sheet_name="FWS 7 Year Workplan Species", usecols="A:G")
spp_list_clean = pd.DataFrame(spp_list).replace({np.nan:None}).apply(lambda x: x.str.strip() if x.dtype == "object" else x)
spp_list_clean["ECOS Link"] = spp_list_clean.apply(lambda x: ecos_bits(x["Scientific Name"]), axis=1)
spp_list_clean["ITIS TSN"] = spp_list_clean.apply(lambda x: ecos_bits(x["Scientific Name"], "ITIS TSN"), axis=1)
spp_list_clean["Lookup Name"] = spp_list_clean.apply(lambda x: lookup_name(x["Scientific Name"]), axis=1)
spp_list = pd.DataFrame(spp_list_clean).replace({np.nan:None}).apply(lambda x: x.str.strip() if x.dtype == "object" else x)


In [8]:
# Cache the array of retrieved documents and return/display a random sample for verification
display(bis_utils.doc_cache("../cache/workplan_species.json", spp_list.to_dict(orient='records')))

{'Doc Cache File': 'cache/workplan_species.json',
 'Number of Documents in Cache': 363,
 'Document Number 27': {'Guild': 'Birds',
  'Species Name (Common)': 'lesser prairie-chicken',
  'Scientific Name': 'Tympanuchus pallidicinctus',
  'Lead FWS Regional Office': 'Region 2 - Southwest',
  'Proposed FWS Decision Timeframe (Fiscal Year)': 2017,
  'Range': 'CO, KS, NM, OK, TX',
  'Bin': None,
  'ECOS Link': 'https://ecos.fws.gov/ecp/species/1924',
  'ITIS TSN': '175838',
  'Lookup Name': 'Tympanuchus pallidicinctus'}}