# How to query PIA JSON API with query_api.py

## Introduction

The query_api.py file queries the PIA JSON API and writes the JSON files returned to a local directory, ready for transformation to Linked Art JSON-LD.

A .yaml file is used for script variables. The filepath for the .yaml file is specified as a script argument --config

## Example settings.yaml  file

A settings.yaml file is specified in the script argument --config. 

An example config file can be viewed at [example settings.yaml](/digitalobject/settings.yaml)

Example content:

```yaml
- - - # settings for digital object
- settings:
  default_lang        : en
  base_url            : https://linkedart.participatory-archives.ch/
  pia_api_uri         : https://data.participatory-archives.ch/api/v1/images
  pia_api_include     : include=collections,date,place
  page_size           : 500
  directory           : digitalobject/
  a_collection        : data/a_collection
  b_mapped            : data/b_mapped
  c_linked_art        : data/c_linked_art
  template            : template.jsonnet
...
```

The query URL for the PIA API query is contstructed using 'pia_api_url' and 'pia_api_include' in the settings.yaml file

JSON files are written to a local directory, with the filepath constructed using the 'directory', 'a_collection' and 'page_size' variables in the setting.yaml file.


## Files
- query_api.py - main script
- settings.py - use function 'query_api()' to check that the correct script arguments have been provided


## Help
The following will provide information about arguments to use with the script.
```
python3 query_api.py -h 
```

## Example command

```python
query_api.py  --config <config-file>
```
e.g.
```python
python3 query_api.py --config ./digitalobject/settings.yaml 
```

## Load Python packages

In [3]:
import json
import requests
import sys
import traceback
import yaml
from yaml.loader import SafeLoader


## Read in config file

Usually command line argument `--config` but added as var `config_file` here

In [4]:
# commented out as not using command line argument for config file
# settings.py in local dir
#import settings

config_file = "../digitalobject/settings.yaml"

with open(config_file) as f:
    data = (yaml.load(f, Loader=SafeLoader))
    
    config = data[1]

config

{'settings': None,
 'default_lang': 'en',
 'base_url': 'https://linkedart.participatory-archives.ch/',
 'pia_api_uri': 'https://data.participatory-archives.ch/api/v1/images',
 'pia_api_include': 'include=collections,date,place',
 'page_size': 500,
 'directory': 'digitalobject/',
 'a_collection': 'data/a_collection',
 'b_mapped': 'data/b_mapped',
 'c_linked_art': 'data/c_linked_art',
 'template': 'template.jsonnet'}

## Option to modify the config settings
If you'd like to try different settings, update the config dictionary in the next code cell e.g. config["directory"] = "../digitalobject/"

In [5]:
config["directory"] = "../digitalobject/" # directory to save files to

config["page_size"] = 200 # number of records to return with each API query

config["pia_api_uri"] = 'https://data.participatory-archives.ch/api/v1/images' # PIA API to query

config["pia_api_include"] = 'include=collections,date,place' # additional data to include for each record returned with API call

## Construct API Query URL
get config variables, ready to construct API query URL 


In [6]:

# dir to store collection data files
a_collection = config["directory"] + config["a_collection"]

# number of records to return per page
page_size = config["page_size"]

# default total number of pages to return - actual number will be determined by call to API
total_pages = 1


# URI for PIA API
api_uri = config["pia_api_uri"]
api_include = ""
if config["pia_api_include"] != None:
    api_include = config["pia_api_include"]

## Query API to get total pages in resultset

initial query of API to get total number of pages, to iterate through in next step

In [8]:
# get total number of pages
query_total_pages = api_uri + "?page[number]=1&page[size]=" + str(page_size)
response = requests.get(query_total_pages)
result = response.json()

# get total pages from api call
total_pages = result["meta"]["page"]["lastPage"]


# construct query to return data
query = api_uri + "?" + api_include + "&page[size]=" + str(page_size)

print("total pages: " + str(total_pages))

total pages: 300


## Query API and write files to local directory

In [10]:
# QUERY API

print("saving files to: " + a_collection)

# iterate through paged records
for page in range(1, total_pages + 1):
    # add page number to query
    query1 = query + "&page[number]=" + str(page)
    print('\n' + query1, end='', flush=True)
    # use try statement to pick up errors
    try:
        # query api
        response = requests.get(query1)
        json_data = response.json()

        # write file to collection data dir
        text_file = open(a_collection + "/" + str(page) + ".json", "wt")
        n = text_file.write(json.dumps(json_data, indent=2))
        text_file.close()

    except Exception as e:
        traceback.print_exc()
        sys.exit(1)

print("script completed")

saving files to: ../digitalobject/data/a_collection

1.
https://data.participatory-archives.ch/api/v1/images?include=collections,date,place&page[size]=200&page[number]=1
2.
https://data.participatory-archives.ch/api/v1/images?include=collections,date,place&page[size]=200&page[number]=2
3.
https://data.participatory-archives.ch/api/v1/images?include=collections,date,place&page[size]=200&page[number]=3
4.
https://data.participatory-archives.ch/api/v1/images?include=collections,date,place&page[size]=200&page[number]=4
5.
https://data.participatory-archives.ch/api/v1/images?include=collections,date,place&page[size]=200&page[number]=5
6.
https://data.participatory-archives.ch/api/v1/images?include=collections,date,place&page[size]=200&page[number]=6
7.
https://data.participatory-archives.ch/api/v1/images?include=collections,date,place&page[size]=200&page[number]=7
8.
https://data.participatory-archives.ch/api/v1/images?include=collections,date,place&page[size]=200&page[number]=8
9.
https://

https://data.participatory-archives.ch/api/v1/images?include=collections,date,place&page[size]=200&page[number]=69
70.
https://data.participatory-archives.ch/api/v1/images?include=collections,date,place&page[size]=200&page[number]=70
71.
https://data.participatory-archives.ch/api/v1/images?include=collections,date,place&page[size]=200&page[number]=71
72.
https://data.participatory-archives.ch/api/v1/images?include=collections,date,place&page[size]=200&page[number]=72
73.
https://data.participatory-archives.ch/api/v1/images?include=collections,date,place&page[size]=200&page[number]=73
74.
https://data.participatory-archives.ch/api/v1/images?include=collections,date,place&page[size]=200&page[number]=74
75.
https://data.participatory-archives.ch/api/v1/images?include=collections,date,place&page[size]=200&page[number]=75
76.
https://data.participatory-archives.ch/api/v1/images?include=collections,date,place&page[size]=200&page[number]=76
77.
https://data.participatory-archives.ch/api/v1/im

Traceback (most recent call last):
  File "<ipython-input-10-70cb9eb4af64>", line 15, in <module>
    json_data = response.json()
  File "/opt/anaconda3/lib/python3.8/site-packages/requests/models.py", line 910, in json
    return complexjson.loads(self.text, **kwargs)
  File "/opt/anaconda3/lib/python3.8/json/__init__.py", line 357, in loads
    return _default_decoder.decode(s)
  File "/opt/anaconda3/lib/python3.8/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/opt/anaconda3/lib/python3.8/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
2022-09-08 12:22:29,174 [42080] ERROR    root:152: [JupyterRequire] Internal Python error in the inspect module.
Below is the traceback from this internal error.



Traceback (most recent call last):
  File "<ipython-input-10-70cb9eb4af64>", line 15, in <module>
    json_data = response.json()
  File "/opt/anaconda3/lib/python3.8/site-packages/requests/models.py", line 910, in json
    return complexjson.loads(self.text, **kwargs)
  File "/opt/anaconda3/lib/python3.8/json/__init__.py", line 357, in loads
    return _default_decoder.decode(s)
  File "/opt/anaconda3/lib/python3.8/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/opt/anaconda3/lib/python3.8/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/anaconda3/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3444, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)


TypeError: object of type 'NoneType' has no len()