**Introduction**

> This tutorial will showcase some basic DPLA API requests for digital objects on the item level. Before the live coding session, we need to install some Python libraries to help us parse the result. Since we are going to manipulate the retrieved data in JSON format, libraries such as 'flatten_json' and 'pandas' will be useful to have. If the libraries below are already installed on your machine, please feel free to skip.


> In general, a metadata record includes objects, array(list), and strings. The type of object retrieved in the metadata record through the API will determine how certain fields of the record to be retrieved through Python code. flatten_json library is useful when a researcher is trying to flatten the DataFrame without worrying much about the type of metadata field.


In [2]:
pip install flatten_json

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting flatten_json
  Downloading flatten_json-0.1.13.tar.gz (11 kB)
Building wheels for collected packages: flatten-json
  Building wheel for flatten-json (setup.py) ... [?25l[?25hdone
  Created wheel for flatten-json: filename=flatten_json-0.1.13-py3-none-any.whl size=7978 sha256=2a0a269ff72663a272542b3933bebe661677a9197933ee7f8eda94377b8c7c58
  Stored in directory: /root/.cache/pip/wheels/87/c5/6d/7a772fecd8d6ebae9e60d997f74b9a96ead7d5a0f26a920090
Successfully built flatten-json
Installing collected packages: flatten-json
Successfully installed flatten-json-0.1.13


In [3]:
pip install pandas

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [4]:
pip install requests

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


> Once the libraries were installed, we need to import the modules first for their later use. import Pandas as pd for ease of typing later. 

In [5]:
import requests
import pandas as pd
import flatten_json

> We use a simple requests function to parse HTTP requests from the API. The API GET results will be store in the variable 'r'.

> What we are doing here is a simple search function of the API, the parameter of 'q' stands for the query terms used for the search. 

> In addition, 'apikey' is where you enter the key obtained from DPLA.

> More syntax for the HTTP request can be found in the DPLA API tutorial here: https://pro.dp.la/developers/requests

> In the future, it would be nice to have individual variables defined in the notebook so that users can just plug in values without worrying about the HTTP syntax


In [6]:
r = requests.get('https://api.dp.la/v2/items?q=cats&api_key=5d03033b8bd8e60c7c34aef1d148d7f3')

> Make sure the results have been successfully retrieved, meaning getting a 200 response code. 

> Other response code explanations can be found here for troubleshoot: https://www.w3.org/Protocols/HTTP/HTRESP.html

In [7]:
r

<Response [200]>

> After making sure the results can be successfully downloaded, we can go ahead and map the request results into the json format using json() function. 'output' will be our new variable housing the values 'r' in json format. 

In [8]:
output = r.json()

> Display keys of the json file in the form of a Python dictionary, which helps to identify which field might be useful for examination.

> But before doing this step for other repositories (with REST APIs), you might want to perform a GET request in your brower address bar to examine what types of structure the json data have. That way, it gives your chances to explore the data structure on a bird's-eye view so it would provide insights of how you want to download the data you need. Since some repositories have limits of how much data you can download during a certain periods of time. So it would be wise to figure out what is the data you need first, so it would be easier for cleaning up the unnecesary data later. 

> There are plenty of json plug-ins for commonly used browsers. For example, I used JSON Viewer for my Chrome browser. 

In [9]:
output.keys()

dict_keys(['count', 'docs', 'facets', 'limit', 'start'])

> First, let's take a look at 'docs' since it is semantically closest to item descriptions/metadata which we are looking for. We then assign the values to the variable 'o'

In [10]:
o = output['docs']

> Let's take a look at what we have got by 'printing' o.

In [11]:
o


[{'@context': 'http://dp.la/api/items/context',
  '@id': 'http://dp.la/api/items/120144529501c589e3e944ece28f986c',
  '@type': 'ore:Aggregation',
  'aggregatedCHO': '#sourceResource',
  'dataProvider': {'@id': 'http://dp.la/api/contributor/illinois-state-university',
   'exactMatch': ['http://www.wikidata.org/entity/Q558922'],
   'name': 'Illinois State University'},
  'id': '120144529501c589e3e944ece28f986c',
  'iiifManifest': 'http://digital.library.illinoisstate.edu/iiif/info/icca/8196/manifest.json',
  'ingestDate': '2022-10-27T19:49:02.165Z',
  'ingestType': 'item',
  'isShownAt': 'http://digital.library.illinoisstate.edu/cdm/ref/collection/icca/id/8196',
  'object': 'http://digital.library.illinoisstate.edu/utils/getthumbnail/collection/icca/id/8196',
  'originalRecord': {'stringValue': '<record xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><header><identifier>urn:dpla-repox.carli.illinois.edu:isu_icca:oai:digital.library.illin

> The metadata record are usuaully in the forms of dictionary, list, and string in Python. So our job is to learn and decide what functions we use to parse the json data. For example, the key() function will not work for lists since the 'keys'/index of a list are numbers. From what I have learned, dictionary is usually used in metadata fields that have subfields, such as 'docs' (aka items in a more generic term). Lists are usually used in metadata fields with more uniformed properties, such as 'subject.' However, a nested structure might be used. Taking 'subject' again as an example, inside the list, each item is a dictionary: [{'name': 'Cats'}, {'name': 'Malibu (Calif.)'}, {'name': 'Corral Beach, Malibu'},{'name': 'Malibu Historical Collection'}]

> That's why trying to run o.key() will show an error message

In [12]:
o.keys()

AttributeError: ignored

> If the object is a list, you can get the length of the list so that it's easier to get the index hence retrieve specific metadata fields 

In [13]:
lenth = len(o)
lenth

10

> Randomly picked the sixth item in the list as an example

In [24]:
subject = o[5]['sourceResource']['subject']
subject

[{'name': 'Cats--California'},
 {'name': 'Oakland (Calif.)--Photographs'},
 {'name': 'Glass negatives'}]

> Now that we know the basics of retrieving metadata and objects from a metadata record, this is another way of have a bird's-eye view of the entire metadata structure by using Pandas DataFrame. The head() function is a preview of the first 5 objects of the DataFrame. 

In [31]:
df = pd.DataFrame(o)
df.head()

Unnamed: 0,@context,@id,@type,aggregatedCHO,dataProvider,id,iiifManifest,ingestDate,ingestType,isShownAt,object,originalRecord,provider,rightsCategory,sourceResource
0,http://dp.la/api/items/context,http://dp.la/api/items/120144529501c589e3e944e...,ore:Aggregation,#sourceResource,{'@id': 'http://dp.la/api/contributor/illinois...,120144529501c589e3e944ece28f986c,http://digital.library.illinoisstate.edu/iiif/...,2022-10-27T19:49:02.165Z,item,http://digital.library.illinoisstate.edu/cdm/r...,http://digital.library.illinoisstate.edu/utils...,"{'stringValue': '<record xmlns=""http://www.ope...","{'@id': 'http://dp.la/api/contributor/il', 'ex...",Unspecified Rights Status,{'@id': 'http://dp.la/api/items/120144529501c5...
1,http://dp.la/api/items/context,http://dp.la/api/items/7fa9791a9dad59e272aa0fb...,ore:Aggregation,#sourceResource,{'@id': 'http://dp.la/api/contributor/universi...,7fa9791a9dad59e272aa0fb5eacba56b,,2022-08-09T15:07:52.194Z,item,https://exploreuk.uky.edu/catalog/xt734t6f3d29...,,"{'stringValue': '{  ""_type"" : ""item"",  ""_id""...","{'@id': 'http://dp.la/api/contributor/kdl', 'e...",Unspecified Rights Status,{'@id': 'http://dp.la/api/items/7fa9791a9dad59...
2,http://dp.la/api/items/context,http://dp.la/api/items/645bb3361585ed0424b6c3c...,ore:Aggregation,#sourceResource,{'@id': 'http://dp.la/api/contributor/pepperdi...,645bb3361585ed0424b6c3cef35b917b,,2022-08-22T16:25:15.617Z,item,http://cdm15730.contentdm.oclc.org/cdm/ref/col...,https://thumbnails.calisphere.org/clip/150x150...,"{'stringValue': '{  ""url_item"" : ""http://cdm1...","{'@id': 'http://dp.la/api/contributor/cdl', 'e...",Unspecified Rights Status,{'@id': 'http://dp.la/api/items/645bb3361585ed...
3,http://dp.la/api/items/context,http://dp.la/api/items/f5a6af1ad25a74a4e4b821a...,ore:Aggregation,#sourceResource,{'@id': 'http://dp.la/api/contributor/uc-san-d...,f5a6af1ad25a74a4e4b821a34045e930,,2022-08-22T16:25:15.617Z,item,https://library.ucsd.edu/dc/object/bb0888354q,https://thumbnails.calisphere.org/clip/150x150...,"{'stringValue': '{  ""url_item"" : ""https://lib...","{'@id': 'http://dp.la/api/contributor/cdl', 'e...",Unspecified Rights Status,{'@id': 'http://dp.la/api/items/f5a6af1ad25a74...
4,http://dp.la/api/items/context,http://dp.la/api/items/4e20c90cd2ea969c6c03796...,ore:Aggregation,#sourceResource,{'@id': 'http://dp.la/api/contributor/californ...,4e20c90cd2ea969c6c037966001b919f,,2022-08-22T16:25:15.617Z,item,https://csl.primo.exlibrisgroup.com/discovery/...,https://thumbnails.calisphere.org/clip/150x150...,"{'stringValue': '{  ""url_item"" : ""https://csl...","{'@id': 'http://dp.la/api/contributor/cdl', 'e...",Unspecified Rights Status,{'@id': 'http://dp.la/api/items/4e20c90cd2ea96...


> If you prefer to have all the data printed to an spreadsheet file and then check the spreadsheet file offline, you can do that too by using the function 'to-excel().' But one thing to pay attention to is that MS Excel is a propreiatary software, using an open-source is preferable when the data has an open license. 

In [None]:
df.to_excel('test.xlsx')

> Similarly, you can print any parts of the metadata headings with this method too

In [None]:
import csv
with open('OUTPUT.csv', 'w') as f:
    for key in o.keys():
        f.write("%s,%s\n"%(key,o[key]))