The spreadsheet of species on the petition list included links to the FWS Ecological Conservation Online System for most of the species. After working through a few issues trying to connect to the FWS Threatened and Endangered Species (TESS) API for the species, I found it necessary to scrape the ECOS web pages for some additional information. It turns out that the ECOS system has multiple identifiers for species that seem to be used in various parts of the data model and that are not all readily available through their APIs. The public web pages seem to assemble a lot of this information from various places through a back-end app of some kind, but there is no real API that I could find to work against for everything. In order to reliably understand and work with the connections to other systems that FWS folks have put together, it seemed like we needed to go ahead and parse out some information from the human-readable web pages into usable data. This code does that using BeautifulSoup.

In [1]:
from bis2 import dd
from IPython.display import display
from datetime import datetime
from bis import tess
import requests
from bs4 import BeautifulSoup

In [2]:
bisDB = dd.getDB("bis")
esaWPSpecies = bisDB["FWS ESA Work Plan Species"]

Right now, I'm just extracting the ITIS TSN and names from the HTML and pulling out the SPCODE values from the URLs when those exist. I've dealt with cases where there are no common names and where there is no ITIS TSN identified.

In [3]:
for record in esaWPSpecies.find({"$and":[{"ECOS Scrape":{"$exists":False}},{"Submitted Data.Species Record Reference":{"$not":{"$eq":float("nan")}}}]}):
    ecosScrape = {}
    ecosScrape["url"] = record["Submitted Data"]["Species Record Reference"]
    
    if ecosScrape["url"].find("spcode") > -1:
        ecosScrape["SPCODE"] = ecosScrape["url"].split("=")[1]
    
    ecosContent = requests.get(ecosScrape["url"]).content
    soup = BeautifulSoup(ecosContent,"lxml")

    title = str(soup.find("title").string)
    if title.find("(") == -1:
        ecosScrape["Scientific Name"] = title.replace("Species Profile for ","").strip()
    else:
        ecosScrape["Common Name"] = title.replace("Species Profile for ","").split("(")[0].strip()
        ecosScrape["Scientific Name"] = title.replace("Species Profile for ","").split("(")[1].replace(")","").strip()
        
    
    itisDiv = soup.find("div", {"class": "taxonomy new-row"})
    if itisDiv is not None:
        itisLink = itisDiv.find("a", href=True)
        ecosScrape["TSN"] = itisLink["href"].split("&")[1].split("=")[1]
    
    esaWPSpecies.update_one({"_id":record["_id"]},{"$set":{"ECOS Scrape":ecosScrape}})