# How to map collection data to intermediate JSON format with map_data.py

## Introduction

The `map_data.py` file maps collection data in JSON format, retrieved from the PIA JSON API, to an intermediate JSON format. 


A .yaml file is used for script variables. The filepath for the .yaml file is specified as a script argument --config

## Example settings.yaml  file

A settings.yaml file is specified in the script argument --config. 

An example config file can be viewed at [example settings.yaml](/digitalobject/settings.yaml)

Example content:

```yaml
- - - # settings for digital object
- settings:
  default_lang        : en
  base_url            : https://linkedart.participatory-archives.ch/
  pia_api_uri         : https://data.participatory-archives.ch/api/v1/images
  pia_api_include     : include=collections,date,place
  page_size           : 500
  directory           : digitalobject/
  a_collection        : data/a_collection
  b_mapped            : data/b_mapped
  c_linked_art        : data/c_linked_art
  template            : template.jsonnet
...
```


## Files
- map_data.py - main script
- map_data_func.py 
- template.jsonnet - JSON template for intermediate JSON file to be populated with collection data
- settings.py - use function 'query_api()' to check that the correct script arguments have been provided
- objtype/data/a_collection/*.json - collection data files from PIA JSON API
- objtype/data/b_mapped/*.json - output JSON files

## Help
The following will provide information about arguments to use with the script.
```
python3 map_data.py -h 
```

## Example command

```python
map_data.py --objtype <objecttype> --config <config-file> --template <jsonnet-template-filepath>
```
e.g.
```python
python3 map_data.py --objtype set --config set/settings.yaml --template set/template.jsonnet

```

`--objtype` is one of:
- `humanmadeobject`
- `digitalobject` 
- `set`

## Load Python packages

## Import Python libraries and local functions

Imports Python libraries and locally-defined functions in map_data_func.py

In [1]:
import json
import os
from os import walk
from pathlib import Path
import yaml
from yaml.loader import SafeLoader

# mapping functions in local dir
import map_data_func





ModuleNotFoundError: No module named 'yaml'

## Get settings
- Commented out use of settings.py
- Using inline variables instead

In [None]:
# settings.py in local dir
# commented out for jupyter notebook
#import settings
#[objtype, config_file, template] = settings.map_data()

objtype = 'digitalobject'
config_file = '../digitalobject/settings.yaml'
template = '../digitalobject/template.jsonnet' 

: 

## Read settings in configuration file

In [None]:
# ===============================================
# READ IN CONFIG

with open(config_file) as f:
    data = yaml.load(f, Loader=SafeLoader)
    config = data[1]

directory = config["directory"]
a_collection = directory + config["a_collection"] # dir to store collection data files
b_mapped = directory + config["b_mapped"] # dir to store mapped data
template = Path(template).read_text() # jsonnet template for intermediate JSON data file





: 

In [None]:
## ===============================================

# ITERATE OVER FILES IN COLLECTION DATA DIR

files = []
for (dirpath, dirnames, filenames) in walk(a_collection):
    files.extend(filenames)
    break
files.sort()

for file in files:
    filename = os.path.basename(file)
    print(filename)  
    # open file 
    with open(a_collection + '/' + file) as json_file:
        filename = os.path.basename(file)
        # load data into python dict
        data = json.load(json_file)
            
        # create lists for additional image, date and place data identified in record
        
        mapped_data = None

        # iterate over included data and append  date, collection and place data to lists
        if "included" in data:
            include_data = []
            data_dates  = []
            data_places = []
            data_collections = []

            for record in data["included"]:
                if record["type"] == "dates":
                    data_dates.append(record)
                elif record["type"] == "places":
                    data_places.append(record) 
                elif record["type"] == "collections":
                    data_collections.append(record)            
            include_data = [data_dates,data_places,data_collections]

        # map data for digital object and humanmadeobject
        if objtype in ["humanmadeobject","digitalobject"]:
            data_images = []
            for record in data["data"]:
                if record["type"] == "images":
                    if objtype == "digitalobject" and record["attributes"]["base_path"] == "SGV_10":
                        data_images.append(record)
                    elif objtype == "humanmadeobject" and record["attributes"]["base_path"] == "SGV_12":
                        data_images.append(record)
            for record in data_images:
                json_data = map_data_func.images(record,template, include_data)
                map_data_func.save_file(json_data,b_mapped, record["id"])
        
        # map collection data
        if objtype == 'set':
            for record in data["data"]:
                if record["type"] == "collections":
                    json_data = map_data_func.set(record,template)
                    map_data_func.save_file(json_data, b_mapped, record["id"])





: 