# Find how many specimens of each species are in the Museum of Victoria collection

In another notebook we harvested a list of species from the Museum of Victoria using their collection API and saved the results as a CSV file.

Here we'll search for specimens matching each of the species and save the total number of records.

We'll use these search parameters:

* `recordtype` which we'll set to 'specimen'
* `taxon` which we'll set the the species' taxon name

In [13]:
import requests
from tqdm.auto import tqdm
import pandas as pd

In [14]:
SEARCH_URL = 'https://collections.museumsvictoria.com.au/api/search'

Load the CSV file containing the list of species.

In [18]:
df_species = pd.read_csv('museum-victoria-species.csv')
df_species.head()

Unnamed: 0,id,taxon_name,common_name
0,species/15849,Nephila edulis,Australian Golden Orb-weaver Spider
1,species/16848,Enypniastes eximia,Swimming sea cucumber
2,species/8608,Cryptocheilus bicolor,Spider Hunting Wasp
3,species/15155,Argiope trifasciata,Banded Garden Spider
4,species/12426,Stanwellia grisea,Melbourne Trapdoor Spider


In [16]:
def get_totals(params):
    '''
    Get the total number of results and pages returned by a search.
    '''
    response = requests.get(SEARCH_URL, params=params)
    # The total results and pages values are in the API response's headers!
    total_results = int(response.headers['Total-Results'])
    total_pages = int(response.headers['Total-Pages'])
    return (total_results, total_pages)

def get_specimen_totals(species):
    '''
    Find the number of specimens matching each species.
    '''
    params = {
            'recordtype': 'specimen'
        }
    total_specimens = []
    for s in tqdm(species):
        params['taxon'] = s['taxon_name']
        total_results, _ = get_totals(params)
        s['total_specimens'] = total_results
        total_specimens.append(s)
    return total_specimens

In [17]:
specimens = get_specimen_totals(df_species.to_dict('records'))

HBox(children=(FloatProgress(value=0.0, max=1408.0), HTML(value='')))




In [19]:
df_specimens = pd.DataFrame(specimens)

Show the top twenty specimens by species!

In [21]:
# Sort the dataframe by total_results then show a slice of the first 20 records
df_specimens.sort_values(by='total_specimens', ascending=False)[:20]

Unnamed: 0,id,taxon_name,common_name,total_specimens
922,species/8463,Amphipoda,Amphipod,20628
1146,species/8483,Leptoceridae,Caddisfly,16629
1005,species/8494,Leptoceridae,Caddisfly larva,16629
1046,species/15127,Chrysomelidae,Eucalyptus Leaf Beetle,11515
1358,species/8532,Castiarina,Jewel Beetle,9609
1013,species/8492,Hydropsychidae,Caddisfly larva,8327
1155,species/8480,Hydropsychidae,Caddisfly,8327
364,species/15892,Ophiurida,Brittle Star,8293
1234,species/8360,Litoria ewingii,Brown Tree Frog,6040
1159,species/8468,Ostracoda,Seed Shrimp,5921


## What next?

* How might you visualise these results?
* Could we include other taxonomic data to group the species?
* How could we get an image of each species (selected at random from matching specimens)? 