# Transform Collection Data - Create Linked Art

This notebook shows how to transform mapped collection data to Linked Art JSON-LD




# Python modules


Three Python modules will be used:
* pandas
* json
* csv

## pandas
https://pandas.pydata.org/

<pre>"pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool,
built on top of the Python programming language."</pre>

`pandas` is used to read in the csv file contents to a pandas dataframe for further processing in Python. A dataframe is a data structure to hold <pre>"two-dimensional, size-mutable, potentially heterogeneous tabular data."</pre>


## json


The Python `json` module is used to encode and decode JSON objects and is used in the script to encode JSON objects before printing.


## csv


<pre>The csv module implements classes to read and write tabular data in CSV format.</pre>


### Further Reading 
* https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html

* https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html

* https://docs.python.org/3/library/json.html

* https://docs.python.org/3/library/csv.htm

In [12]:

# https://pandas.pydata.org/

try:
    import pandas as pd
except:
    %pip install pandas
    import pandas as pd


try:
    import json
except:
    %pip install json
    import json 
  
try:
    import csv
except:
    %pip install csv
    import csv
    
    
try:
    import IPython
except:
    %pip install IPython
    import IPython 
from IPython.display import display, IFrame, HTML, Javascript


## Read CSV file contents using pandas

The following code demonstrates how to read CSV into a pandas dataframe.

In [17]:
file = '../data/pma/input/pma_ruskin.csv'

# remove BOM
s = open(file, mode='r', encoding='utf-8-sig').read()
open(file, mode='w', encoding='utf-8').write(s)


mpg = pd.read_csv(file,low_memory=False)
mpg.head()

Unnamed: 0,Media,Object Number,Department,Classification,Culture,Period,Display Name,Object Name,Title,Dated,Medium,Dimensions,Description,Attribution,AttributionSort,Credit Line
0,0,64-1993-1,PDP,Drawings,,,John Ruskin,Drawings,"Study of a River Bank, Beauvais, France",1846,Watercolor,H: 125mm W: 175mm,,John Ruskin,Ruskin John,
1,1,01/07/1995,PDP,Drawings,,,John Ruskin,Drawings,Beanstalk,Date unknown,"Pen and yellow-brown ink and wash, graphite pe...",Sheet: 7 3/4 x 11 1/8inches (19.7 x 28.3cm),,John Ruskin,Ruskin John,Purchased with The Herbert & Nannette Rothschi...


# Mapping collection data to entities in the Linked Art data model

We will use a Python dictionary that was created in another notebook. The dictionary keys are strings that represent entities in the Linked Art data model, and the values are mapped column names in the collection data CSV file


In [20]:
mapp =  {
    "id":"Object Number",
    "title": "Title",
    "accession_number":"Object Number",
    "accession_date": "accession_date",
    "classification" : "Classification",
    "alt_title": "",
    "notes": "Description",
    "date_created":"Dated",
    "date_created_earliest": "",
    "date_created_latest": "",
    "created_period":"Period",
    "created_dynasty":"",
    "created_inscriptions":"",
    "created_notes": "Description",
    "creator":"Display Name",
    "physical_medium": "Medium",
    "physical_style": "",
    "physical_technique": "physical_technique",
    "physical_description": "physical_description",
    "physical_dimensions": "Dimensions",
    "created_provenance": "Attribution" ,
    "credit_line": "Credit Line",
    "collection" : "Department",
    "current_status" : "",
    "homepage": ""
}

# Create dictionary using the data mapping

The next step create a dictionary containing artwork properties from the input data mapped to the keys in `mapp` that represent Linked Art entities. This is required before the collection data can be transformed to JSON-LD in the process described here.

In [31]:
# open file and read into Python dictionary
collection_data = csv.DictReader(open(file, mode='r',encoding='utf-8'))

#src https://note.nkmk.me/en/python-dict-get-key-from-value/
def get_key_from_value(d, val):
    keys = [k for k, v in d.items() if v == val]
    if keys:
        return keys[0]
    return None

baseURI = "https://philamuseum.org/collection/object/"


# uses mapp to create a dictionary of mapp entities and collection data
def createObjProp(obj,mapp):
    objProp = {}
    csv_keys = list(obj.keys())
    for key in csv_keys:
        for prop in mapp:
            if key == mapp[prop]:
                if prop == "creator":
                    objProp[prop] = [{"id": baseURI +"creatorid/" + obj[mapp["id"]] ,"name": obj[key],"role":"Artist"}]
                else:
                    objProp[prop] = obj[key]
    objProp["homepage"] = ""
    if "id" in objProp:
        objProp["id"] = baseURI +  objProp["id"]
    return objProp

"""
# if you want to see the mapped data alongside the original collection data record, uncomment this section of code 
# - removing triple speech marks
# iterate through collection data 
for obj in collection_data:
    display(HTML("<H3>Collection Data Record</H3>"))
    print(json.dumps(obj,indent=2))
    display(HTML("<H3>Collection Data Record Mapped to Linked Art Entities</H3>"))
    objProp = createObjProp(obj,mapp)
    print(json.dumps(objProp, indent=2))
    break
"""
print("")





## Create mapped data dictionary and write to file

In [32]:
obj_list =  []
obj_dict = {}

for record in collection_data:
    objProp = createObjProp(record,mapp)
    objProp_copy = objProp.copy()
    obj_list.append(objProp_copy)

obj_dict["records"] = obj_list
print(json.dumps(obj_dict, indent=2))  

obj_file = './data/working/object_data_mapped_example.json'
       
with open(obj_file, 'w') as f:
        f.write(json.dumps(obj_dict, indent=2))

{
  "records": [
    {
      "id": "https://philamuseum.org/collection/object/64-1993-1",
      "accession_number": "64-1993-1",
      "collection": "PDP",
      "classification": "Drawings",
      "created_period": "",
      "creator": [
        {
          "id": "https://philamuseum.org/collection/object/creatorid/64-1993-1",
          "name": "John Ruskin",
          "role": "Artist"
        }
      ],
      "title": "Study of a River Bank, Beauvais, France",
      "date_created": "1846",
      "physical_medium": "Watercolor",
      "physical_dimensions": "H: 125mm  W: 175mm",
      "notes": "",
      "created_notes": "",
      "created_provenance": "John Ruskin",
      "credit_line": "",
      "homepage": ""
    },
    {
      "id": "https://philamuseum.org/collection/object/01/07/1995",
      "accession_number": "01/07/1995",
      "collection": "PDP",
      "classification": "Drawings",
      "created_period": "",
      "creator": [
        {
          "id": "https://philamuseum.

# Try now with your own data and mapping

* upload data file and mapping file 

In [33]:
try:
    import ipywidgets as widgets
except:
    %pip install ipywidgets
    import ipywidgets as widgets
    
from ipywidgets import Layout, FileUpload 


try:
    import IPython
except:
    %pip install IPython
    import IPython 
from IPython.display import display, IFrame, HTML, Javascript
from IPython.core.display import HTML


import io

import ipywidgets as widgets



In [44]:
display(HTML("<h3>Upload CSV file</h3>"))
# define file upload widget
upload_csv = FileUpload(accept='*.csv', multiple=False, description='Select data file')
upload_csv.style.button_color = 'orange'


upload_csv


FileUpload(value={}, accept='*.csv', description='Select data file', style=ButtonStyle(button_color='orange'))

In [53]:
# upload and write files to disc

csv_upload = "./data/working/data_upload.csv"

for filename in upload_csv.value:
    content = upload_csv.value[filename]["content"]
    with open(csv_upload, 'w+b') as f: 
        f.write(content) 

# open file and read into Python dictionary

uploaded_file = open(csv_upload, mode='r',encoding='utf-8')
if uploaded_file:
    collection_data = csv.DictReader(uploaded_file)

    for obj in collection_data:
        print(json.dumps(obj,indent=2))
        break  
        

{
  "Media": "0",
  "Object Number": "64-1993-1",
  "Department": "PDP",
  "Classification": "Drawings",
  "Culture": "",
  "Period": "",
  "Display Name": "John Ruskin",
  "Object Name": "Drawings",
  "Title": "Study of a River Bank, Beauvais, France",
  "Dated": "1846",
  "Medium": "Watercolor",
  "Dimensions": "H: 125mm  W: 175mm",
  "Description": "",
  "Attribution": "John Ruskin",
  "AttributionSort": "Ruskin John",
  "Credit Line": ""
}


In [41]:
upload_mapp = FileUpload(accept='*.csv', multiple=False, description='Select mapping')
upload_mapp.style.button_color = 'orange'
display(HTML("<h3>Upload mapping file</h3>"))
upload_mapp

FileUpload(value={}, accept='*.csv', description='Select mapping', style=ButtonStyle(button_color='orange'))

In [54]:
mapp_upload = 'data/working/mapp_upload.json'
   
# mapp 
for filename in upload_mapp.value:   
    if filename != ""  : 
        file_uploaded = True
        content = upload_mapp.value[filename]["content"]
        with open(mapp_upload, 'wb') as f: f.write(content)       
               
with open(mapp_upload) as json_file:
        # json.load() takes a file object and returns the json object
        mapp = json.load(json_file)
        # for demonstration print json
        print(json.dumps(mapp,indent=2))       
        

{
  "id": "Object Number",
  "accession_number": "Object Number",
  "accession_date": "accession_date",
  "classification": "Classification",
  "title": "Title",
  "alt_title": "",
  "notes": "Description",
  "date_created": "Dated",
  "date_created_earliest": "",
  "date_created_latest": "",
  "created_period": "Period",
  "created_dynasty": "",
  "created_inscriptions": "",
  "created_notes": "Description",
  "creator": "Display Name",
  "physical_medium": "Medium",
  "physical_style": "",
  "physical_technique": "physical_technique",
  "physical_description": "physical_description",
  "physical_dimensions": "Dimensions",
  "created_provenance": "Attribution",
  "credit_line": "Credit Line",
  "collection": "Department",
  "current_status": "",
  "homepage": ""
}


# Create dictionary containing Linked Art entities mapped to collection data

The following step takes the collection data file and the mapping to create a dictionary containing the collection data mapped to strings representing Linked Art entities.

In [55]:


baseURI = "https://example.org/collection/object/"

def createObjProp(obj,mapp):
    objProp = {}
    csv_keys = list(obj.keys())
    for key in csv_keys:
        for prop in mapp:
            if key == mapp[prop]:
                if prop == "creator":
                   
                    objProp[prop] = [{"id": baseURI +"creatorid/" + obj[mapp["id"]] ,"name": obj[key],"role":"Artist"}]
                else:
                    objProp[prop] = obj[key]
    objProp["homepage"] = ""
    
    return objProp

"""
# iterate through uploaded collection data 
for record in collection_data:
    
    display(HTML("<H3>Collection Data Record</H3>"))
    print(json.dumps(record,indent=2))
    
    display(HTML("<H3>Collection Data Record Mapped to Linked Art Entities</H3>"))
    objProp = createObjProp(record,mapp)
    print(json.dumps(objProp, indent=2))
   
"""

print("")




In [58]:
obj_list =  []
obj_dict = {}

for record in collection_data:
    objProp = createObjProp(record,mapp)
    objProp_copy = objProp.copy()
    obj_list.append(objProp_copy)

obj_dict["records"] = obj_list
print(json.dumps(obj_dict, indent=2))  

obj_file = './data/working/object_data_mapped.json'
       
with open(obj_file, 'w') as f:
        f.write(json.dumps(obj_dict, indent=2))


{
  "records": []
}


# Summary

This notebook has taken collection data and using a Linked Art data mapping, mapped the collection data in a dictionary to Linked Art entities.

A file has been created:
- [./data/working/object_data_mapped.json](./data/working/object_data_mapped.json)

The next step is to take this mapped data file and transform the collection data to Linked Art JSON-LD.