# How to map collection data to intermediate JSON format with map_data.py

## Introduction

The `map_data.py` file maps collection data in JSON format, retrieved from the PIA JSON API, to an intermediate JSON format. 


A .yaml file is used for script variables. The filepath for the .yaml file is specified as a script argument --config

## Example settings.yaml  file

A settings.yaml file is specified in the script argument --config. 

An example config file can be viewed at [example settings.yaml](/digitalobject/settings.yaml)

Example content:

```yaml
- - - # settings for digital object
- settings:
  default_lang        : en
  base_url            : https://data.participatory-archives.ch/
  pia_api_uri         : https://json.participatory-archives.ch/api/v1/images
  pia_api_include     : include=collections,date,place
  page_size           : 500
  directory           : digitalobject/
  a_collection        : data/a_collection
  b_mapped            : data/b_mapped
  c_linked_art        : data/c_linked_art
  template            : template.jsonnet
...
```


## Files
- map_data.py - main script
- map_data_func.py 
- template.jsonnet - JSON template for intermediate JSON file to be populated with collection data
- settings.py - use function 'query_api()' to check that the correct script arguments have been provided
- digitalobject/data/a_collection/*.json - collection data files from PIA JSON API
- digitalobject/data/b_mapped/*.json - output JSON files

## Help
The following will provide information about arguments to use with the script.
```
python3 map_data.py -h 
```

## Example command

```python
map_data.py --objtype <objecttype> --config <config-file> --template <jsonnet-template-filepath>
```
e.g.
```python
python3 map_data.py --objtype set --config set/settings.yaml --template set/template.jsonnet

```

`--objtype` is one of:
- `humanmadeobject`
- `digitalobject` 
- `set`


## Import Python libraries and local functions

Imports Python libraries and locally-defined functions in map_data_func.py

In [1]:
import json
import os
from os import walk
from pathlib import Path
import yaml
from yaml.loader import SafeLoader


%pip install jsonnet
# mapping functions in local dir
import map_data_func





Note: you may need to restart the kernel to use updated packages.


## Get settings
- Commented out use of settings.py
- Using inline variables instead

In [2]:
# settings.py in local dir
# commented out for jupyter notebook
#import settings
#[objtype, config_file, template] = settings.map_data()

objtype = 'digitalobject'
config_file = 'digitalobject/settings.yaml'
template = 'digitalobject/template.jsonnet' 

## Read settings in configuration file

In [3]:
# ===============================================
# READ IN CONFIG

with open(config_file) as f:
    data = yaml.load(f, Loader=SafeLoader)
    config = data[1]

    
print(config_file)

directory = config["directory"]
a_collection = directory + config["a_collection"] # dir to store collection data files
b_mapped = directory + config["b_mapped"] # dir to store mapped data
template = Path(template).read_text() # jsonnet template for intermediate JSON data file




digitalobject/settings.yaml


In [4]:
## ===============================================

# ITERATE OVER FILES IN COLLECTION DATA DIR

files = []
for (dirpath, dirnames, filenames) in walk(a_collection):
    files.extend(filenames)
    break
files.sort()

for file in files:
    filename = os.path.basename(file)
    print("Processing file: " + filename)  
    # open file 
    with open(a_collection + '/' + file) as json_file:
        filename = os.path.basename(file)
        # load data into python dict
        data = json.load(json_file)
            
        # create lists for additional image, date and place data identified in record
        
        mapped_data = None

        # iterate over included data and append  date, collection and place data to lists
        if "included" in data:
            include_data = []
            data_dates  = []
            data_places = []
            data_collections = []

            for record in data["included"]:
                if record["type"] == "dates":
                    data_dates.append(record)
                elif record["type"] == "places":
                    data_places.append(record) 
                elif record["type"] == "collections":
                    data_collections.append(record)            
            include_data = [data_dates,data_places,data_collections]

        # map data for digital object and humanmadeobject
        if objtype in ["humanmadeobject","digitalobject"]:
            data_images = []
            for record in data["data"]:
                if record["type"] == "images":
                    if objtype == "digitalobject" and record["attributes"]["base_path"] == "SGV_10":
                        data_images.append(record)
                    elif objtype == "humanmadeobject" and record["attributes"]["base_path"] == "SGV_12":
                        data_images.append(record)
            for record in data_images:
                json_data = map_data_func.images(record,template, include_data)
                
                map_data_func.save_file(json_data,b_mapped, record["id"])
        
        # map collection data
        if objtype == 'set':
            for record in data["data"]:
                if record["type"] == "collections":
                    json_data = map_data_func.set(record,template)
                    map_data_func.save_file(json_data, b_mapped, record["id"])





Processing file: 1.json

 saving mapped collection data file: digitalobject/data/b_mapped/1.json
 saving mapped collection data file: digitalobject/data/b_mapped/2.json
 saving mapped collection data file: digitalobject/data/b_mapped/3.json
 saving mapped collection data file: digitalobject/data/b_mapped/4.json
 saving mapped collection data file: digitalobject/data/b_mapped/5.json
 saving mapped collection data file: digitalobject/data/b_mapped/6.json
 saving mapped collection data file: digitalobject/data/b_mapped/7.json
 saving mapped collection data file: digitalobject/data/b_mapped/8.json
 saving mapped collection data file: digitalobject/data/b_mapped/9.json
 saving mapped collection data file: digitalobject/data/b_mapped/10.json
 saving mapped collection data file: digitalobject/data/b_mapped/11.json
 saving mapped collection data file: digitalobject/data/b_mapped/12.json
 saving mapped collection data file: digitalobject/data/b_mapped/13.json
 saving mapped collection data file

 saving mapped collection data file: digitalobject/data/b_mapped/112.json
 saving mapped collection data file: digitalobject/data/b_mapped/113.json
 saving mapped collection data file: digitalobject/data/b_mapped/114.json
 saving mapped collection data file: digitalobject/data/b_mapped/115.json
 saving mapped collection data file: digitalobject/data/b_mapped/116.json
 saving mapped collection data file: digitalobject/data/b_mapped/117.json
 saving mapped collection data file: digitalobject/data/b_mapped/118.json
 saving mapped collection data file: digitalobject/data/b_mapped/119.json
 saving mapped collection data file: digitalobject/data/b_mapped/120.json
 saving mapped collection data file: digitalobject/data/b_mapped/121.json
 saving mapped collection data file: digitalobject/data/b_mapped/122.json
 saving mapped collection data file: digitalobject/data/b_mapped/123.json
 saving mapped collection data file: digitalobject/data/b_mapped/124.json
 saving mapped collection data file: d

 saving mapped collection data file: digitalobject/data/b_mapped/222.json
 saving mapped collection data file: digitalobject/data/b_mapped/223.json
 saving mapped collection data file: digitalobject/data/b_mapped/224.json
 saving mapped collection data file: digitalobject/data/b_mapped/225.json
 saving mapped collection data file: digitalobject/data/b_mapped/226.json
 saving mapped collection data file: digitalobject/data/b_mapped/227.json
 saving mapped collection data file: digitalobject/data/b_mapped/228.json
 saving mapped collection data file: digitalobject/data/b_mapped/229.json
 saving mapped collection data file: digitalobject/data/b_mapped/230.json
 saving mapped collection data file: digitalobject/data/b_mapped/231.json
 saving mapped collection data file: digitalobject/data/b_mapped/232.json
 saving mapped collection data file: digitalobject/data/b_mapped/233.json
 saving mapped collection data file: digitalobject/data/b_mapped/234.json
 saving mapped collection data file: d

 saving mapped collection data file: digitalobject/data/b_mapped/332.json
 saving mapped collection data file: digitalobject/data/b_mapped/333.json
 saving mapped collection data file: digitalobject/data/b_mapped/334.json
 saving mapped collection data file: digitalobject/data/b_mapped/335.json
 saving mapped collection data file: digitalobject/data/b_mapped/336.json
 saving mapped collection data file: digitalobject/data/b_mapped/337.json
 saving mapped collection data file: digitalobject/data/b_mapped/338.json
 saving mapped collection data file: digitalobject/data/b_mapped/339.json
 saving mapped collection data file: digitalobject/data/b_mapped/340.json
 saving mapped collection data file: digitalobject/data/b_mapped/341.json
 saving mapped collection data file: digitalobject/data/b_mapped/342.json
 saving mapped collection data file: digitalobject/data/b_mapped/343.json
 saving mapped collection data file: digitalobject/data/b_mapped/344.json
 saving mapped collection data file: d

 saving mapped collection data file: digitalobject/data/b_mapped/442.json
 saving mapped collection data file: digitalobject/data/b_mapped/443.json
 saving mapped collection data file: digitalobject/data/b_mapped/444.json
 saving mapped collection data file: digitalobject/data/b_mapped/445.json
 saving mapped collection data file: digitalobject/data/b_mapped/446.json
 saving mapped collection data file: digitalobject/data/b_mapped/447.json
 saving mapped collection data file: digitalobject/data/b_mapped/448.json
 saving mapped collection data file: digitalobject/data/b_mapped/449.json
 saving mapped collection data file: digitalobject/data/b_mapped/450.json
 saving mapped collection data file: digitalobject/data/b_mapped/451.json
 saving mapped collection data file: digitalobject/data/b_mapped/452.json
 saving mapped collection data file: digitalobject/data/b_mapped/453.json
 saving mapped collection data file: digitalobject/data/b_mapped/454.json
 saving mapped collection data file: d