# Pipeline Stage 3.2
The ITIS stage in the SGCN pipeline kicks off all the other stages to operate independently. Since ITIS ends up providing us with a number of additional search vectors and is the primary determinant of putting SGCN species on the National List, we run this stage first to assemble a bunch of additional names to go after from other sources and identifiers to use in at least one case (USFWS Ecological Conservation Online System).

All of the sppin information processors use a single function, process_sppin_source_search_term(), built to handle the only slightly different operations in each case. It can operate against a message queue, retrieving a single message and processing it through to completion. Running the process locally, I first retrieve all messages from a given queue and run them in parallel. The limitations in how many messages we can process come from both the number of concurrent connections we can make to our database and from the number of HTTP requests we want to pass to the ITIS (or any other) API. When moving to a Lambda environment, the database connection issue should no longer be a factor, but we will need to throttle the number of concurrent connections we are sending out to third party APIs.

The ITIS process proliferates many new messages out onto other queues. It will send all of the names it encounteres, both the original name from the SGCN source and any additional names in the ITIS records (not necessarily all synonyms at this point, just the names encountered through search or following through to valid taxonomic records), to the message queues for other SppIn information gatherers. It will also send any names that it does not find to a WoRMS queue.

Both the ITIS and WoRMS processors also send messages with taxonomic authority summary information when they encounter a usable valid record that are used to infuse additional properties into the SGCN master table indicating whether or not a scientific name should be placed onto the "SGCN National List."

In [1]:
import pysgcn
sgcn = pysgcn.sgcn.Sgcn()

from joblib import Parallel, delayed
from tqdm import tqdm

mq = "mq_itis_check"
sppin_source = "itis"

In [2]:
messages = sgcn.sql_mq.get_all_records("mq", mq)

In [3]:
%%time
Parallel(n_jobs=8, prefer="threads")(
    delayed(sgcn.process_sppin_source_search_term)
    (
        message_queue=mq,
        sppin_source=sppin_source,
        message_id=message["id"], 
        message_body=message["body"]
    ) for message in tqdm(messages)
)

100%|██████████| 34656/34656 [32:28<00:00, 17.79it/s]  


CPU times: user 15min 1s, sys: 4min 30s, total: 19min 31s
Wall time: 32min 29s


['MESSAGE PROCESSED: Scientific Name:Fimbristylis littoralis var. littoralis',
 'MESSAGE PROCESSED: Scientific Name:Panicum dichotomum var. nitidum',
 'MESSAGE PROCESSED: Scientific Name:Trichostomum tenuirostre var. gemmiparum',
 'MESSAGE PROCESSED: Scientific Name:Seligeria donniana',
 'MESSAGE PROCESSED: Scientific Name:Stygobromus onondagaensis',
 'MESSAGE PROCESSED: Scientific Name:Sphagnum capillifolium',
 'MESSAGE PROCESSED: Scientific Name:Scaphiopus holbrookii',
 'MESSAGE PROCESSED: Scientific Name:Zosteractis interminata',
 'MESSAGE PROCESSED: Scientific Name:Cyperus flavicomus',
 'MESSAGE PROCESSED: Scientific Name:Forsstroemia producta',
 'MESSAGE PROCESSED: Scientific Name:Rallus limicola',
 'MESSAGE PROCESSED: Scientific Name:Cervus canadensis',
 'MESSAGE PROCESSED: Scientific Name:Sida elliottii',
 'MESSAGE PROCESSED: Scientific Name:Cypripedium candidum',
 'MESSAGE PROCESSED: Scientific Name:Syntrichia papillosa',
 'MESSAGE PROCESSED: Scientific Name:Berula erecta var. 