# 1. Exploring Wellcome Collection's APIs

Wellcome collection has a few public APIs which can be used to fetch things like works, images, and concepts. They all live behind the following base URL

In [1]:
base_url = "https://api.wellcomecollection.org/catalogue/v2/"

The APIs are primarily built to serve the [Wellcome Collection website](https://wellcomecollection.org/), but they are also available for anyone to use! They're a great way to get access to the data that Wellcome Collection has about its collections programmatically.

## 1.1 Making requests

We can make requests to that base API URL using the `requests` library. Let's have a look at the `/works` endpoint first.

In [2]:
import requests

response = requests.get(base_url + "works")
response.status_code

200

The response has a `200` status code, which indicates that the works API has responded successfully.

Let's have a look at the fields it gives us.

In [3]:
list(response.json())

['type', 'pageSize', 'totalPages', 'totalResults', 'results', 'nextPage']

Let's look at everything _except_ the `results` field for now

In [4]:
for key, value in response.json().items():
    if key != "results":
        print(key, value)

type ResultList
pageSize 10
totalPages 116058
totalResults 1160574
nextPage https://api.wellcomecollection.org/catalogue/v2/works?page=2


1,160,572 works! That's a lot of works. We're only seeing 10 in this response though, because the `pageSize` is set to 10 by default.

Let's have a look at the fields in the first result.

In [5]:
results = response.json()["results"]
first_result = results[0]
list(first_result)

['physicalDescription',
 'workType',
 'alternativeTitles',
 'id',
 'title',
 'type',
 'availabilities']

And here's the full first result, with all of its values.

In [6]:
first_result

{'physicalDescription': '[2],51,[1]p. ; 80.',
 'workType': {'id': 'a', 'label': 'Books', 'type': 'Format'},
 'alternativeTitles': [],
 'id': 'a222wwjt',
 'title': "The pigeon-Pye, or, a King's coronation, proper materials for forming an oratorio, opera, or play, according to the modern taste: To Be Represented in Opposition to the Dragon of Wantley. By an admirer of bad composition, and author of - nothing.",
 'type': 'Work',
 'availabilities': [{'id': 'online',
   'label': 'Online',
   'type': 'Availability'}]}

## 1.2 Requesting individual works

We can make requests for individual works by adding an ID to the end of our works API URL. Here's the first work again, but this time we're requesting it by ID.

In [7]:
first_work_id = results[0]["id"]
work_url = base_url + "works/" + first_work_id
work_url

'https://api.wellcomecollection.org/catalogue/v2/works/a222wwjt'

In [8]:
response = requests.get(work_url).json()
response

{'physicalDescription': '[2],51,[1]p. ; 80.',
 'workType': {'id': 'a', 'label': 'Books', 'type': 'Format'},
 'alternativeTitles': [],
 'id': 'a222wwjt',
 'title': "The pigeon-Pye, or, a King's coronation, proper materials for forming an oratorio, opera, or play, according to the modern taste: To Be Represented in Opposition to the Dragon of Wantley. By an admirer of bad composition, and author of - nothing.",
 'type': 'Work',
 'availabilities': [{'id': 'online',
   'label': 'Online',
   'type': 'Availability'}]}

As expected, the data is the same as the first result in the previous response.

## 1.3 Sorting and searching

By default, works are sorted by the alphabetical order of their IDs (so we're seeing `a222wwjt` first, followed by other works starting with `a22...`).

In [9]:
response = requests.get(base_url + "works").json()

for work in response["results"]:
    print(work["id"])

a222wwjt
a222zvge
a2239muq
a223speg
a2242545
a22526q9
a2262ru9
a226rz35
a227dajt
a227y9ye


We can add a `query` query parameter to our request to see results sorted by relevance. Let's search for works that contain the word "horse".

In [10]:
response = requests.get(base_url + "/works", params={"query": "horse"}).json()
for i, result in enumerate(response["results"]):
    print(f"{i+1}. {result['title']}")
    print(f"   https://wellcomecollection.org/works/{result['id']}")
    print()

1. Shire horse stud book.
   https://wellcomecollection.org/works/pgwnkf2h

2. Horse restrained in horse-box for injection
   https://wellcomecollection.org/works/v4t9q43f

3. Horse doctor giving medicine to a horse, German, 18th century
   https://wellcomecollection.org/works/wb4aqmdf

4. Horse foetuses: five figures showing the foetus of a horse during the gestation period, with dissections of its abdomen and stomach demonstrating the foetal circulation system. Engraving by T. Cowan after B. Herring, ca. 1860.
   https://wellcomecollection.org/works/c3xsp8nx

5. Reports on African horse-sickness / [by J.A. Nunn].
   https://wellcomecollection.org/works/eck73b9q

6. An elegy upon his honoured friend Mr. James Herewyn, unfortunately slain by a fall from his horse.
   https://wellcomecollection.org/works/fqsyrgnu

7. Cavalarice. Or the English horseman: contayning all the art of horsemanship, asmuch as is necessary for any man to vnderstand, whether hee be horse-breeder, horse-ryder, ho

Here, the results are sorted by how relevant they are to the search term.

We can also sort the results by other fields. Let's try sorting by when the works were produced, using the `production.dates` field. We'll also add an `include` parameter to our request, so that we can see the `production.dates` field in the results.

In [11]:
response = requests.get(
    base_url + "/works",
    params={
        "query": "horse",
        "sort": "production.dates",
        "include": "production",
    },
).json()

The `production` field is an array of `ProductionEvent` objects, each of which has:
- a `label`
- a list of `agents`
- a list of `dates`
- a list of `places`

We can see those for the first result in the list like this:

In [12]:
response["results"][0]["production"]

[{'label': 'Mid 14th century',
  'agents': [],
  'dates': [{'label': 'Mid 14th century', 'type': 'Period'}],
  'type': 'ProductionEvent',
  'places': []}]

Internally, each `productionEvent` date has a start and an end (because often we don't know _exactly_ when a work was produced). The works for our request are sorted by the earliest _start_ date in their `production.dates` field.

In [13]:
for i, result in enumerate(response["results"]):
    print(f"{i+1}. {result['title']}")
    print(f"   {result['production'][0]['dates'][0]['label']}")
    print(f"   https://wellcomecollection.org/works/{result['id']}")
    print()

1. Galenic miscellany
   Mid 14th century
   https://wellcomecollection.org/works/t4rdykqn

2. Middle English medical miscellany, including receipts and charms (Leech-Books, VI)
   Late 14th-early 15th century
   https://wellcomecollection.org/works/y7xcant4

3. Medical Compendium in English
   15th century
   https://wellcomecollection.org/works/rs9qgwh8

4. Works on regimen
   Late 15th Century
   https://wellcomecollection.org/works/d8ge66h7

5. Collection of remedies for the diseases of horses
   c.1500
   https://wellcomecollection.org/works/rt2w4mjw

6. Giordano Ruffo, <i>Libro dell'infirmita dei cavalli</i>, and other texts
   c.1500
   https://wellcomecollection.org/works/aqhz7ht6

7. A skeleton sits backwards on a horse waiting for a blacksmith, dressed as a fool, to finish his job. Woodcut.
   [1494-1497]
   https://wellcomecollection.org/works/v5keep94

8. Antiquities of Rome. Album of engravings, 15--.
   [between 1500 and 1599]
   https://wellcomecollection.org/works/gr7nv

Those results are in ascending order, but we can also change the `sortOrder` to give us newer works first.

In [14]:
response = requests.get(
    base_url + "/works",
    params={
        "query": "horse",
        "sort": "production.dates",
        "sortOrder": "desc",
        "include": "production",
    },
).json()

for i, result in enumerate(response["results"]):
    print(f"{i+1}. {result['title']}")
    print(f"   {result['production'][0]['dates'][0]['label']}")
    print(f"   https://wellcomecollection.org/works/{result['id']}")
    print()

1. George Stubbs : 'all done from nature' / [curated by Paul Bonaventura, Martin Postle and Anthony Spira].
   2019
   https://wellcomecollection.org/works/a8pxrhjn

2. The world I fell out of / Melanie Reid ; foreword by Andrew Marr.
   2019
   https://wellcomecollection.org/works/k99u64v4

3. Reefer madness comics / edited and designed by Craig Yoe ; introduction written by Craig Yoe with Steven Thompson.
   2018
   https://wellcomecollection.org/works/tj4pk7qj

4. Inferno : a doctor's ebola story / Steven Hatch, M.D.
   2017
   https://wellcomecollection.org/works/pbvnushb

5. Animal metropolis : histories of human-animal relations in urban Canada / edited by Joanna Dean, Darcy Ingram, and Christabelle Sethna.
   [2017]
   https://wellcomecollection.org/works/q736twd3

6. The state of Grace / Rachael Lucas.
   2017
   https://wellcomecollection.org/works/dra4bt4m

7. Insider trading : how mortuaries, medicine and money have built a global market in human cadaver parts / Naomi Pfeffe

## 1.4 Filtering results

We can ask the API to return works between a set of dates, using the `production.dates.from` and `production.dates.to` parameters. Let's ask for works produced between 1900 and 1910.

In [15]:
response = requests.get(
    base_url + "/works",
    params={
        "production.dates.from": "1900-01-01",
        "production.dates.to": "1910-01-01",
    },
).json()

response["totalResults"]

45830

Previously, we used the `production.dates` to _sort_ our results. 

Here, the results are sorted in the default order (ie sorted by `id`), but they're _filtered_ to only show works which were produced in the range we're interested in.

We can also filter by lots of other fields, like subjects! Let's ask for works about cats, by using the `subjects.label` field.

In [16]:
response = requests.get(
    base_url + "/works",
    params={
        "subjects.label": "Cats",
    },
).json()

response["totalResults"]

258



## 1.5 Including extra fields in the response

We can ask the API to give us extra information in the response by adding an `include` query parameter to our request, as we did above to get our `production` events for each work.

There are lots of other fields we can request for each work:

- `identifiers`
- `items`
- `holdings`
- `subjects`
- `genres`
- `contributors`
- `production`
- `languages`
- `notes`
- `images`
- `succeededBy`
- `precededBy`
- `partOf`
- `parts`

The full documentation for each of them is available in the [API documentation](https://developers.wellcomecollection.org/api/catalogue#tag/Works/operation/getWorks).

Let's have a look at `subjects`, as an example.

In [17]:
response = requests.get(
    base_url + "/works",
    params={
        "include": "subjects",
    },
).json()

The `subjects` field is an array of `Subject` objects, each of which has:
- a `label`
- an `id`
- a list of `concepts`, where each concept has
    - a `label`
    - an `id`
    - a `type`, eg `Concept`, `Period`, `Person`, `Place`

We can see those for the first result in the response like this:

In [18]:
response["results"][0]["subjects"]

[{'label': 'English drama - 18th century',
  'concepts': [{'id': 'h2ctt8e6', 'label': 'English drama', 'type': 'Concept'},
   {'id': 'nhvtuf2z', 'label': '18th century', 'type': 'Period'}],
  'id': 'nxrdce2w',
  'type': 'Subject'}]

In the next notebook, we'll start requesting bigger batches of data and doing some more local data science and analysis to answer some more interesting questions. In the meantime, here are some exercises to test your understanding of what we've covered so far.


## Exercises

1. Fetch the data for the work with the id `ca5c6h4x`
2. Make a request for a work which includes all of its `genres` (these are the types/techniques of the work, eg `painting`, `etching`, `poster`)
3. Find the oldest and newest work about `pigs` in the collection
4. Filter the works about `pigs` to only include those that were produced in the 20th century