#  Transforming Collections Data to Linked Art - Philadelphia Museum of Art

## Introduction

[Linked Art](https://linked.art) is a community working together to create a shared Model based on Linked Open Data to describe Art. A number of exemplars will be published to demonstrate the processes involved in producing Linked Art JSON-LD, and also the potential applications of Linked Art, on the theme of:
- `Transformation` - Documented transformation process - using code, documentation and possibly visualisation
- `Reconciliation` - Documented reconciliation process - matching data with an external identifier source
- `Visualisation` - Documented transformation of Linked Art JSON-LD to data visualisation

This exemplar is concerned with `Transformation` - the transformation process, from collections data to Linked Art JSON-LD.

## Aim of the Notebook
The aim of the notebook is to demonstrate how easy it is to transform collections data to Linked Art JSON-LD.
## How
The notebook provides a documented, interactive code example of the transformation process, from collections data to Linked Art using data from the Philadelphia Museum of Art (PMA). 

## Input Data

## Attribution

- The Linked Art data model documentation has been sourced from the [Linked Art website](https://linked.art)

## Transformation Steps

### 1. Import What We Need for Notebook
- Import Python libraries

In [1]:
try:
    import ipywidgets as widgets
except:
    !pip install ipywidgets
    import ipywidgets as widgets

from ipywidgets import Layout
from ipywidgets import FileUpload

try:
    import IPython
except:
    !pip install IPython
    import IPython   
    
from IPython.display import display
from IPython.core.display import HTML
from IPython.display import IFrame

   
try:
    import xmltodict
except:
    !pip install xmltodict
    import xmltodict

try:
    import json
except:
    !pip install json
    import json 
    
    

try:
    import requests
except:
    !pip install requests
    import requests

        
#  baseURI for JSON-LD document
baseURI = "https://philamuseum.org/collection/object/"


### 2. Parse CSV File

In [2]:
import csv

file = './data/pma/input/Ruskin_Okeeffe.csv'

#remove BOM
s = open(file, mode='r', encoding='utf-8-sig').read()
open(file, mode='w', encoding='utf-8').write(s)

allObjects = csv.DictReader(open(file, mode='r',encoding='utf-8'))

all_linkedart = {}
    

In [67]:
def getObjDesc(obj, functionCall):
    
    desc = {}
    
    irn = obj["Object Number"]
    department  = obj["Department"]
    title = obj["Title"]
    datecreated = obj["Dated"]
    medium = obj["Medium"]
    classification = obj["Classification"]
    attribution = obj["Attribution"]
    credit_line = obj["Credit Line"]
    titAccessionNo = ""
    provenance = ""
    earliestdate = latestdate = ""
    
    if functionCall == "member":
        desc = objMember(obj,baseURI,PhyCollectionArea)
    if functionCall == "custody":
        desc = objCustody(TitObjectStatus)
    if functionCall == "owner":
        desc = objOwner(obj,baseURI,irn,TitAccessionDate,TitObjectStatus)
    if functionCall == "production":
        desc = objProd(obj, baseURI,irn, earliestdate,latestdate,datecreated)
    if functionCall == "core":
        desc = objCore(obj,baseURI,irn,title) 
    if functionCall == "id":
        desc = objId(obj, baseURI, irn, titAccessionNo)
    if functionCall == "names":
        desc = objNames(obj, baseURI, irn, title)
    if functionCall == "class":
        if "titleObjectType" in vars():
            desc = objClass(obj, baseURI, irn, titleObjectType)
    if functionCall == "home":
            desc = objHome(obj,baseURI,irn)
    if functionCall == "location":
            desc = objLocation(obj,baseURI)
    if functionCall == "ling":
            desc = objLing(obj,baseURI,irn, credit_line, provenance)
    return desc

    
    
    

### 3. Data File as Python dictionary
The data file is converted to a Python dictionary, and will be used to transform the collection data for the artwork to JSON-LD. The following shows an example of a single artwork represented in the Python dictionary:

In [68]:
allObjects = csv.DictReader(open(file, mode='r',encoding='utf-8'))

for obj in allObjects:
    print(json.dumps(obj,indent=2))
    break


{
  "Media": "0",
  "Object Number": "64-1993-1",
  "Department": "PDP",
  "Classification": "Drawings",
  "Culture": "",
  "Period": "",
  "Display Name": "John Ruskin",
  "Object Name": "Drawings",
  "Title": "Study of a River Bank, Beauvais, France",
  "Dated": "1846",
  "Medium": "Watercolor",
  "Dimensions": "H: 125mm  W: 175mm",
  "Description": "",
  "Attribution": "John Ruskin",
  "AttributionSort": "Ruskin John",
  "Credit Line": "",
  "": ""
}


## 4. Build the Linked Art JSON-LD files <a id="build"/>

The following steps will transform the catalogue data for all of the artworks to Linked Art JSON-LD. The transformation will be divided into sections, using different parts of the Linked Art data model.

- [4.1 Core Properties](#core)
- [4.2 Identifiers](#id)
- [4.3 Names](#name)
- [4.4 Classification](#class)
- [4.5 Home Page](#home)
- [4.6 Current Location](#location)
- [4.7 Linguistic Objects](#ling)
- [4.8 Production](#prod)
- [4.9 Acquisition](#owner)
- [4.10 Custody](#curate)
- [4.11 Membership](#member)

### 4.1 Core Properties <a id='#core'>
<a id='core_properties'></a>
[Linked Art Data Model documentation](https://linked.art/model/base/#core-properties)

There are a few core properties that every resource should have for it to be a useful part of the world of Linked Open Data:

- `@context`
- `id`
- `type`
- `_label`

--- 
    
[Back to build menu](#build)

In [69]:
def objCore(obj, baseURI, irn, title):
    core = {}

        # minimum Linked Art properties
    core["@context"] = "https://linked.art/ns/v1/linked-art.json"
    core["id"] = baseURI + "object/" + irn
    core["type"] = "HumanMadeObject"
    if "title" in vars():
        core["_label"] = title 
    
    return core

In [70]:
for obj in allObjects:
    irn = obj["Object Number"]
    core = getObjDesc(obj, "core")
    all_linkedart[irn] = core 
print(json.dumps(all_linkedart,indent=2))

{
  "64-1993-1": {
    "@context": "https://linked.art/ns/v1/linked-art.json",
    "id": "https://philamuseum.org/collection/object/object/64-1993-1",
    "type": "HumanMadeObject",
    "_label": "Study of a River Bank, Beauvais, France",
    "identified_by": [
      {
        "id": "https://philamuseum.org/collection/object/object/64-1993-1/title",
        "type": "Name",
        "_label": "Primary Title for the Object",
        "content": "Study of a River Bank, Beauvais, France",
        "classified_as": [
          {
            "id": "http://vocab.getty.edu/aat/300404670",
            "type": "Type",
            "_label": "preferred terms"
          }
        ]
      }
    ]
  },
  "1995-7-1": {
    "@context": "https://linked.art/ns/v1/linked-art.json",
    "id": "https://philamuseum.org/collection/object/object/1995-7-1",
    "type": "HumanMadeObject",
    "_label": "Beanstalk"
  },
  "9-1982-1": {
    "@context": "https://linked.art/ns/v1/linked-art.json",
    "id": "https://ph

### 4.2 Identifiers <a id="id"/>

https://linked.art/model/base/#identifiers

Many resources of interest are also given external identifiers, such as accession numbers for objects, ORCIDs for people or groups, lot numbers for auctions, and so forth. Identifiers are represented in a very similar way to names, but instead use the Identifier class. Identifiers will normally have a classification determining which sort of identifier it is, to distinguish between internal repository system assigned numbers from museum assigned accession numbers, for example.

As Identifiers and Names use the same `identified_by` property, the JSON will frequently have mixed classes in the array. Unlike Names, Identifiers are not part of human language and thus cannot have translations or a language associated with them.

--- 
    
[Back to build menu](#build)

In [71]:
def objId(obj, baseURI, irn, titAccessionNo):
    artwork = {}       
    artwork["identified_by"] = []          
    artwork["identified_by"].append({
        "id": baseURI + "object/" + irn + "/irn",
        "type": "Identifier",
        "_label": "PMA Object Number",
        "content": irn,
        "classified_as": [{
            "id": "http://vocab.getty.edu/aat/300404621",
            "type": "Type",
            "_label": "repository numbers"
                        }]
                }) 
    if titAccessionNo != "":
        artwork["identified_by"].append({
        "id": baseURI + "object/" + irn + "/object-number",
        "type": "Identifier",
        "_label": "IMA at Newfields Object Number for the Object",
        "content": titAccessionNo,
        "classified_as": [{
            "id": "http://vocab.getty.edu/aat/300312355",
            "type": "Type",
            "_label": "accession numbers"
                        }]
                })
        
    return artwork

In [72]:
allObjects = csv.DictReader(open(file, mode='r',encoding='utf-8'))

for obj in allObjects:
    irn = obj["Object Number"]
    id = getObjDesc(obj, "id")
    
    all_linkedart[irn].update(id) 
print(json.dumps(id,indent=2))   

{
  "identified_by": [
    {
      "id": "https://philamuseum.org/collection/object/object/testq/irn",
      "type": "Identifier",
      "_label": "PMA Object Number",
      "content": "testq",
      "classified_as": [
        {
          "id": "http://vocab.getty.edu/aat/300404621",
          "type": "Type",
          "_label": "repository numbers"
        }
      ]
    }
  ]
}


### 4.3 Names <a id="name"/>

[Linked Art Data Model documentation](https://linked.art/model/base/#names)


As the `_label` property is intended as internal documentation for the data, it is strongly recommended that every resource that should be rendered to an end user also have at least one specific name. The name could be for an object, a person, a group, an event or anything else. This pattern uses the `identified_by` property, with a `Name` resource. The value of the name is given in the content property of the `Name`.

It is somewhat unintuitive to think of a name as identifying the resource it is associated with, as names are typically not unique. However, as the name itself is uniquely identified rather than just an anonymous string, they are no longer a shared label and instead the particular instance of a name is uniquely associated with the resource. With this formulation, the name instance does uniquely identify the resource.

If there is more than one name given, then there should be one that is `classified_as` the primary name for use. This is done by adding the `Primary Name (aat:300404670) term to it. There should be exactly one primary title given per language.

Names are also part of human communication, and can have the Linguistic features of the model associated with them, such as having a particular language, or having translations.

--- 
    
[Back to build menu](#build)

In [73]:
def objNames(obj, baseURI, irn, title):

    artwork = {}
    artwork["identified_by"] = []
    
    artwork["identified_by"].append({
        "id": baseURI + "object/" + irn + "/title",
        "type": "Name",
        "_label": "Primary Title for the Object",
        "content": title ,
        "classified_as": [{
        "id": "http://vocab.getty.edu/aat/300404670",
        "type": "Type",
        "_label": "preferred terms"
                        }]
                })
    try:   
        if obj["table"]["@name"] == "AltTitles":
            x = 0
            for tuple in obj["table"]["tuple"]:
                x +=1
                for atom in tuple:
                    content = ""
                    if atom["@name"] == "TitAlternateTitles":
                        content = atom["#text"]
                        artwork["identified_by"].append({
                            "id": baseURI + "object/" + irn + "/alt-title-" + x,
                            "type": "Name",
                            "_label": "Alternate Title for the Object",
                            "content": content,
                            "classified_as": [{
                                "id": "http://vocab.getty.edu/aat/300417227",
                                "type": "Type",
                                "_label": "alternate titles"}]   
                        })
    except:
        pass

    return artwork

In [74]:
allObjects = csv.DictReader(open(file, mode='r',encoding='utf-8'))

for obj in allObjects:
    irn = obj["Object Number"]  
    desc = getObjDesc(obj, "names")
    all_linkedart[irn].update(desc) 
print(json.dumps(desc,indent=2))

{
  "identified_by": [
    {
      "id": "https://philamuseum.org/collection/object/object/testq/title",
      "type": "Name",
      "_label": "Primary Title for the Object",
      "content": "Blue Lines",
      "classified_as": [
        {
          "id": "http://vocab.getty.edu/aat/300404670",
          "type": "Type",
          "_label": "preferred terms"
        }
      ]
    }
  ]
}


### 4.7 Statements about a Resource - Linguistic Objects <a id="ling"/>

[Linked Art Data Model Documentation](https://linked.art/model/base/#statements-about-a-resource)
    
In many cases, current data does not support the level of specificity that the full ontology allows, or the information is simply best expressed in human-readable form. For example, instead of a completely modeled set of parts with materials, many museum collection management systems allow only a single human-readable string for the "medium" or "materials statement". The same is true in many other situations, including rights or allowable usage statements, dimensions, edition statements and so forth. Any time that there is a description of the resource, with or without qualification as to the type of description, then this pattern can be used to record the descriptive text.

The pattern makes use of the `LinguisticObject` class that is used to identify a particular piece of textual content. These Linguistic Objects are then refered to by any other resource. They maintain the statement's text in the content property, and the language of the statement (if known) in the language property.

Use cases for this pattern include:

- General description of the resource
- Materials statement for an object
- Attribution statement for an image
- Biography for a person
- Dimensions statement for a part of an object    

---

[Back to build menu](#build)

In [75]:
def objLing(obj, baseURI,irn, SumCreditLine, provenance):
    artwork = {}
    artwork["referred_to_by"] = []
    if SumCreditLine != "":
        artwork["referred_to_by"].append(
                {
                "id": baseURI + "object/" + irn + "/credit-line",
                "type": "LinguisticObject",
                "_label": "Credit Line for the Object",
                "content": SumCreditLine,
                "classified_as": [
                        {
                        "id": "http://vocab.getty.edu/aat/300026687",
                        "type": "Type",
                        "_label": "acknowledgments"
                        },
                        {
                        "id": "http://vocab.getty.edu/aat/300418049",
                        "type": "Type",
                        "_label": "brief texts"
                        }]
                })
              
    if provenance != "":
        artwork["referred_to_by"].append({
                "id": baseURI + "object/" + irn + "/provenance-statement",
                    "type": "LinguisticObject",
                    "_label": "Provenance Statement about the Object",
                    "content": provenance,
                    "classified_as": [
                        {
                            "id": "http://vocab.getty.edu/aat/300055863",
                            "type": "Type",
                            "_label": "provenance (history of ownership)"
                        },
                        {
                            "id": "http://vocab.getty.edu/aat/300418049",
                            "type": "Type",
                            "_label": "brief texts"
                        }
                    ]
            })
        
            
    return artwork

In [83]:
allObjects = csv.DictReader(open(file, mode='r',encoding='utf-8'))


for obj in allObjects:
    irn = obj["Object Number"] 
    desc = getObjDesc(obj, "ling")
    all_linkedart[irn].update(desc) 
print(json.dumps(desc,indent=2)) 

{
  "referred_to_by": [
    {
      "id": "https://philamuseum.org/collection/object/object/testq/credit-line",
      "type": "LinguisticObject",
      "_label": "Credit Line for the Object",
      "content": "Gift of the Doris Bry Trust, 2016",
      "classified_as": [
        {
          "id": "http://vocab.getty.edu/aat/300026687",
          "type": "Type",
          "_label": "acknowledgments"
        },
        {
          "id": "http://vocab.getty.edu/aat/300418049",
          "type": "Type",
          "_label": "brief texts"
        }
      ]
    }
  ]
}


## 4.8 Production <a id="prod"/>

[Linked Art Data Model Documentation](https://linked.art/model/object/production/)

The first activity in an object's lifecycle is its creation, or `Production`. The relationship to the object that was produced by the activity (`produced`) is added to the general activity model, along with the time, location and actors. This follows the base pattern for activities.

---

[Back to build menu](#build)

In [90]:
def objProd(obj, baseURI,irn, earliestdate,latestdate,datecreated):
           
    attr = obj["Attribution"]
    artwork = {}
    
    #produced_by property
    artwork["produced_by"] = []
    artwork["produced_by"].append({
                 "id": baseURI + "object/" + irn + "/production",
                "type": "Production",
                "_label": "Production of the Object"})
    
    #carried_out_by property
    carried_out_by = []
    carried_out_by.append(
                                {
                                "id":  baseURI + "actor/" + irn,
                                "type": "Actor",
                                "_label": attr
                                }
                            )        
                                  
    if len(carried_out_by) > 0:
        artwork["produced_by"].append(
                    {
                    "carried_out_by": carried_out_by
                    })
                
    # timespan property
    timespan = False
    
    
    for date in (earliestdate,latestdate,datecreated):
        if date != "":
            timespan = True
            
    if timespan == True:
        #label
        label = "date unknown"
        if datecreated != "":
            label = datecreated
        elif (earliestdate != "") or (latestdate != ""):
            label = earliestdate + " - " + latestdate
        
        timespanObj = {
               "id": baseURI + irn + "/production/timespan",
                "type": "TimeSpan",
                "_label": label,
            }
        
        if earliestdate != "":
            timespanObj["begin_of_the_begin"] = earliestdate
    
        if latestdate != "":
            timespanObj["end_of_the_end"] = latestdate
        
        artwork["produced_by"].append({"timespan" : timespanObj}) 
        
    return artwork

In [87]:
allObjects = csv.DictReader(open(file, mode='r',encoding='utf-8'))

for obj in allObjects:
    irn = obj["Object Number"] 
    desc = getObjDesc(obj, "production") 
    all_linkedart[irn].update(desc) 
print(json.dumps(desc,indent=2)) 

{
  "produced_by": [
    {
      "id": "https://philamuseum.org/collection/object/object/testq/production",
      "type": "Production",
      "_label": "Production of the Object"
    },
    {
      "carried_out_by": [
        {
          "id": "https://philamuseum.org/collection/object/actor/testq",
          "type": "Actor",
          "_label": "Georgia O'Keeffe"
        }
      ]
    },
    {
      "timespan": {
        "id": "https://philamuseum.org/collection/object/testq/production/timespan",
        "type": "TimeSpan",
        "_label": "1974"
      }
    }
  ]
}


## 4.10 Custody <a id="curate"/>

[Linked Art Data Model documentation](https://linked.art/model/provenance/custody/#institutional-ownership-departmental-custody)

Objects are owned by legal entities, such as museum organizations or individual people. However there may be more information about which department is responsible within a museum for the curation of the object. This is the division between acquisitions (the legal ownership of the object) and custody (the responsibility for looking after the object). If the department is known, then it should be either part of the Provenance Event in which the object is acquired, or a separate provenance event if the object was not accessioned by a department and later came under their care, or was transferred between departments. In these latter cases, the ownership does not change, only the custody of the object.

The department becomes the `current_keeper` of the object, whereas the institution is the `current_owner`.

---

[Back to build menu](#build)

In [None]:
def objCustody(TitObjectStatus):
    
    currentowner = False
    artwork = {}
    
    checkObjStatus = ('Accessioned','Partial Accession')
    for status in checkObjStatus:
        if status == TitObjectStatus:
            currentowner = True
    if 'IMA-Owned' in TitObjectStatus:
            currentowner = True
            
    if currentowner == False:  
        artwork["current_keeper"] =  {
                "id": "http://vocab.getty.edu/ulan/500300517",
                "type": "Group",
                "_label": "PMA",
                "classified_as": [
                    {
                        "id": "http://vocab.getty.edu/aat/300312281",
                        "type": "Type",
                        "_label": "museums (institutions)"
                    }]}
    
    return artwork

### View the final Linked Art JSON-LD 

In [88]:
for irn in all_linkedart:
    text_file = open("./data/pma/output/json/all/" + irn + ".json", "w")
    n = text_file.write(json.dumps(all_linkedart[irn], indent=2))
    text_file.close()

f = open("./data/pma/output/json/all/allobjects_linkedart.json", "w")
f.write(json.dumps(all_linkedart, indent=2))
f.close() 

In [89]:
import os
from IPython.core.display import display, HTML


def fn():       # 1.Get file names from directory
    file_list=os.listdir(r"./data/pma/output/json/all/")
   
    for file in file_list:
        display(HTML("<a href='./data/pma/output/json/all/" + file +"'>" + file + "</a>"))
    
fn()