# Catalogue Data to Linked Art - Indianapolis Museum of Art - Transform All Records

## Introduction

- [Linked Art](https://linked.art) is a community working together to create a shared Model and API specificaiton, based on Linked Open Data to describe Art. 

- The [Indianapolis Museum of Art (IMA)](https://discovernewfields.org/) has transformed a sample of its catalogue data to Linked Art JSON-LD, available at https://github.com/IMAmuseum/LinkedArt 
- Data were sourced from EMu in XML format (EMu source: Catalogue, Rights, Narratives, and Locations modules)
- This Jupyter notebook uses the [Objects catalogue data](https://github.com/IMAmuseum/LinkedArt/blob/master/XML/ObjectsSample.xml) ([raw file](https://raw.githubusercontent.com/IMAmuseum/LinkedArt/master/XML/ObjectsSample.xml) ) to provided a documented exemplar of the transformation process, from catalogue data to Linked Art JSON-LD. The notebook transforms all objects in the file to Linked Art.


### Import What We Need for Notebook
- Import Python libraries

In [207]:
try:
    import ipywidgets as widgets
except:
    !pip install ipywidgets
    import ipywidgets as widgets

try:
    import IPython
except:
    !pip install IPython
    import IPython   
    
    
try:
    import xmltodict
except:
    !pip install xmltodict
    import xmltodict

try:
    import json
except:
    !pip install json
    import json  
        
from ipywidgets import Layout

def widgeText(desc, jdoc, ht):
    widg = widgets.Textarea(
        value=json.dumps(jdoc, indent=2),
        placeholder="",
        description=desc,
        disabled=False,
        layout=Layout(width='100%', height=ht))
    return widg

## Data Transformation 

In [208]:
#define vars
baseURI = "https://data.discovernewfields.org/"

### Upload XML File
- Choose a file on your local system to upload and transform to Linked Art
- The IMA files are available to download from: https://github.com/IMAmuseum/LinkedArt/tree/master/XML

In [209]:
from ipywidgets import FileUpload
from IPython.display import display

upload = FileUpload(accept='.xml', multiple=False, description='Select XML file')

In [210]:
display(upload)

obj = "UNDEFINED"

# get content from uploaded file 
for uploaded_filename in upload.value:
    content = upload.value[uploaded_filename]['content']
    obj = xmltodict.parse(content) 
    
if obj == "UNDEFINED":
    print("Please select a file to transform")

FileUpload(value={}, accept='.xml', description='Select XML file')

Please select a file to transform


In [212]:
allObjects = obj["table"]["tuple"]

all_linkedart = {}

TypeError: string indices must be integers

### Minimum Linked Art representation

<a id='core_properties'></a>

#### Core Properties
https://linked.art/model/base/#core-properties

There are a few core properties that every resource should have for it to be a useful part of the world of Linked Open Data:

- @context
- id
- type
- _label
The simplest possible object has a URI, a class and a label.

##### Mapping
- The `id` is a URL and has been created from the `irn` value together with a URL prefix: https://data.discovernewfields.org/

- The `_label` is a human readable label, intended for developers and other people reading the data. The value is taken from the `TitMainTitle` property.

In [213]:
for obj in allObjects:
    
    minla = {}
    irn = title = ""
    
    for prop in obj["atom"]:         
        propName = prop["@name"]     
        if "#text" in prop:
            if propName == "irn":
                irn = prop["#text"]
            if propName == "TitMainTitle":
                title = prop["#text"]

    # minimum Linked Art properties
    minla["@context"] = "https://linked.art/ns/v1/linked-art.json"
    minla["id"] = baseURI + "object/" + irn
    minla["type"] = "HumanMadeObject"
    minla["_label"] = title 

    min = {}
    min[irn] = minla

    all_linkedart.update(min)

# display json in textarea
widgeText("Linked Art", all_linkedart,'150px' )

Textarea(value='{\n  "1032": {\n    "@context": "https://linked.art/ns/v1/linked-art.json",\n    "id": "https:…

### Identifiers

https://linked.art/model/base/#identifiers

Many resources of interest are also given external identifiers, such as accession numbers for objects, ORCIDs for people or groups, lot numbers for auctions, and so forth. Identifiers are represented in a very similar way to names, but instead use the Identifier class. Identifiers will normally have a classification determining which sort of identifier it is, to distinguish between internal repository system assigned numbers from museum assigned accession numbers, for example.

As Identifiers and Names use the same `identified_by` property, the JSON will frequently have mixed classes in the array. Unlike Names, Identifiers are not part of human language and thus cannot have translations or a language associated with them.

In [214]:



for obj in allObjects:
    jdoc = {}
    minla = {}
    irn = title = ""
    
    for prop in obj["atom"]:         
        propName = prop["@name"]     
        if "#text" in prop:
            if propName == "irn":
                irn = prop["#text"]
            if propName == "TitAccessionNo":
                titAccessionNo = prop["#text"]

                           
    jdoc["identified_by"] = []          
    jdoc["identified_by"].append({
        "id": baseURI + "object/" + irn + "/irn",
        "type": "Identifier",
        "_label": "IMA at Newfields Collections Database Number for the Object",
        "content": irn,
        "classified_as": [{
            "id": "http://vocab.getty.edu/aat/300404621",
            "type": "Type",
            "_label": "repository numbers"
                        }]
                })
                   
    jdoc["identified_by"].append({
        "id": baseURI + "object/" + irn + "/object-number",
        "type": "Identifier",
        "_label": "IMA at Newfields Object Number for the Object",
        "content": titAccessionNo,
        "classified_as": [{
            "id": "http://vocab.getty.edu/aat/300312355",
            "type": "Type",
            "_label": "accession numbers"
                        }]
                    })

    
    all_linkedart[irn].update(jdoc)

widgeText("Linked Art", all_linkedart,'200px' )

Textarea(value='{\n  "1032": {\n    "@context": "https://linked.art/ns/v1/linked-art.json",\n    "id": "https:…

### Names

https://linked.art/model/base/#names

As the `_label` property is intended as internal documentation for the data, it is strongly recommended that every resource that should be rendered to an end user also have at least one specific name. The name could be for an object, a person, a group, an event or anything else. This pattern uses the `identified_by` property, with a `Name` resource. The value of the name is given in the content property of the `Name`.

It is somewhat unintuitive to think of a name as identifying the resource it is associated with, as names are typically not unique. However, as the name itself is uniquely identified rather than just an anonymous string, they are no longer a shared label and instead the particular instance of a name is uniquely associated with the resource. With this formulation, the name instance does uniquely identify the resource.

If there is more than one name given, then there should be one that is `classified_as` the primary name for use. This is done by adding the `Primary Name (aat:300404670) term to it. There should be exactly one primary title given per language.

Names are also part of human communication, and can have the Linguistic features of the model associated with them, such as having a particular language, or having translations.

In [215]:
for obj in allObjects:
    jdoc = {}
    jdoc["identified_by"] = []
    irn = title = ""
    
    
    for prop in obj["atom"]:         
        propName = prop["@name"]     
        if "#text" in prop:
            if propName == "irn":
                irn = prop["#text"]
            if propName == "TitMainTitle":
                title = prop["#text"]


    jdoc["identified_by"].append({
        "id": baseURI + "object/" + irn + "/title",
        "type": "Name",
        "_label": "Primary Title for the Object",
        "content": title ,
        "classified_as": [{
        "id": "http://vocab.getty.edu/aat/300404670",
        "type": "Type",
        "_label": "preferred terms"
                        }]
                })
    try:   
        if obj["table"]["@name"] == "AltTitles":
            x = 0
            for tuple in obj["table"]["tuple"]:
                x +=1
                for atom in tuple:
                    if atom["@name"] == "TitAlternateTitles":
                        content = atom["#text"]
                    else:
                        content = ""
                    
                    jdoc["identified_by"].append({
                    "id": baseURI + "object/" + irn + "/alt-title-" + x,
                    "type": "Name",
                    "_label": "Alternate Title for the Object",
                    "content": content,
                    "classified_as": [{
                            "id": "http://vocab.getty.edu/aat/300417227",
                            "type": "Type",
                            "_label": "alternate titles"}]   
                    })
    except:
        pass
    
    for idby in jdoc["identified_by"]:
        all_linkedart[irn]["identified_by"].append(idby)
        
widgeText("Linked Art", all_linkedart,'200px' )

Textarea(value='{\n  "1032": {\n    "@context": "https://linked.art/ns/v1/linked-art.json",\n    "id": "https:…

### Classification

https://linked.art/model/base/#types-and-classifications

CIDOC-CRM is a framework that must be extended via additional vocabularies and ontologies to be useful. The provided mechanism for doing this is the classified_as property, which refers to a term from a controlled vocabulary. This is in contrast to the `type` property, which is used for CIDOC-CRM defined classes, and a few extensions as needed. 

The `classified_as` property is thus a way to be more specific about the sort of entity, while maintaining the core information as the class using type. Controlled vocabulary entries should not be used with `type`, nor classes used with `classified_as`.

While any external vocabulary of terms can be used, the Getty's Art and Architecture Thesaurus is used whenever possible for consistency and that it is already widespread in the museum domain. The set of terms that have been identified as useful are listed in the community best-practices for recommendations, and within the documentation of the model when a particular choice is essential for interoperability.

In [216]:
for obj in allObjects:
    jdoc = {}
    irn = title = ""
    phyMediaCategory = ""
    
    for prop in obj["atom"]:         
        propName = prop["@name"]     
        if "#text" in prop:
            if propName == "irn":
                irn = prop["#text"]
            if propName == "TitObjectType":
                titleObjectType = prop["#text"]
                             
    for table in obj["table"]:
        tableName = table["@name"]
        if tableName == "ObjectTypes":
            try:
                for atom in table["tuple"]:
                    if atom["atom"]["@name"] == "PhyMediaCategory":
                        phyMediaCategory = atom["atom"]["#text"]
            except:
                pass
            
        if titleObjectType.startswith("Visual Work"):
            jdoc["classified_as"] = [{
                    "id": "http://vocab.getty.edu/aat/300133025",
                    "type": "Type",
                    "_label": "works of art" 
                }]
                
        if "Drawings" in titleObjectType:
            jdoc["classified_as"] = [{
                    "id": "http://vocab.getty.edu/aat/300033973",
                    "type": "Type",
                    "_label": "drawings (visual works)" 
                }]
                
        if "Multimedia" in titleObjectType:
            jdoc["classified_as"] = [{
                    "id": "http://vocab.getty.edu/aat/300047910",
                    "type": "Type",
                    "_label": "multimedia works" 
                }]
                
        if "Needlework" in titleObjectType:
            jdoc["classified_as"] = [{
                    "id": "http://vocab.getty.edu/aat/300264072",
                    "type": "Type",
                    "_label": "needlework (visual works)" 
                }]
                
        if "Paintings" in titleObjectType:
            jdoc["classified_as"] = [{
                    "id": "http://vocab.getty.edu/aat/300033618",
                    "type": "Type",
                    "_label": "paintings (visual works)" 
                }] 
                
        if "Pastel" in titleObjectType:
            jdoc["classified_as"] = [{
                    "id": "http://vocab.getty.edu/aat/300076922",
                    "type": "Type",
                    "_label": "pastels (visual works)" 
                }] 
        if "Performance" in titleObjectType:
            jdoc["classified_as"] = [{
                    "id": "http://vocab.getty.edu/aat/300121445",
                    "type": "Type",
                    "_label": "performance art" 
                }] 
        if "Photograph" in titleObjectType:
            jdoc["classified_as"] = [{
                    "id": "http://vocab.getty.edu/aat/300046300",
                    "type": "Type",
                    "_label": "photographs" 
                }]
        if "Prints" in titleObjectType:
            jdoc["classified_as"] = [{
                    "id": "http://vocab.getty.edu/aat/300041273",
                    "type": "Type",
                    "_label": "prints (visual works)" 
                }]
        if "Sculpture" in titleObjectType:
            jdoc["classified_as"] = [{
                    "id": "http://vocab.getty.edu/aat/300047090",
                    "type": "Type",
                    "_label": "sculpture (visual works)" 
                }]
                
        if phyMediaCategory != "":  
            try:
                atoms = table["tuple"]["atom"]
                for atom in atoms:
                    if atom["@name"] == "PhyMediaCategory":
                        value = atom["#text"]
                        c = {
                            "id": baseURI + "thesauri/type/",
                            "type": "Type",
                            "_label": value
                            }
                        jdoc["classified_as"].append(c)
            except:
                pass
 
    all_linkedart[irn].update(jdoc)
        
widgeText("Linked Art", all_linkedart,'200px' )

Textarea(value='{\n  "1032": {\n    "@context": "https://linked.art/ns/v1/linked-art.json",\n    "id": "https:…

### Home Page
https://linked.art/model/digital/#home-page

A very common scenario is that there is a web page about the object, perhaps managed by a collections management system. For humans, this page is much more useful than the data intended for machines. It can be referenced with the `subject_of` property, and points to a `DigitalObject` which is `classified_as` a web page, or `aat:300264578`. As with digital images, the home page can have a format of "text/html" and other properties.


In [217]:
for obj in allObjects:
    jdoc = {}
    irn = title = ""
    homepageId = ""
    
    for prop in obj["atom"]:         
        propName = prop["@name"]     
        if "#text" in prop:
            if propName == "irn":
                irn = prop["#text"]
            if propName == "TitAccessionNo":
                titAccessionNo = prop["#text"]
                
                       
    for table in obj["table"]:
        tableName = table["@name"]
        if tableName == "Homepage":
            try:
                for atom in table["tuple"]["atom"]:
                    if atom["@name"] == "EleIdentifier":
                        homepageId = atom["#text"]
            except:
                pass
                               
    jdoc["subject_of"] = []  
    jdoc["subject_of"].append({
        "id": "http://collection.imamuseum.org/artwork/" + homepageId,
        "type": "LinguisticObject",
        "_label": "Homepage for the Object",
        "classified_as": [
            {
            "id": "http://vocab/getty.edu/aat/300264578",
            "type": "Type",
            "_label": "Web pages (documents)"
            },
            {
            "id": "http://vocab.getty.edu/aat/300266277",
            "type": "Type",
            "_label": "home pages"
            }
        ],
        "format": "text/html"
                })
                  
    all_linkedart[irn].update(jdoc)
        
widgeText("Linked Art", all_linkedart,'200px' )

Textarea(value='{\n  "1032": {\n    "@context": "https://linked.art/ns/v1/linked-art.json",\n    "id": "https:…

### Current Location
https://linked.art/model/object/ownership/#location

- The current location of the object is given using the `current_location` property. 
- This can give a reference to a gallery or specific part of a facility, or be used for the general address of the organization where the object is currently held. 
- There are further modeling details available about [Places](https://linked.art/model/place/) on the Linked Art website.



In [218]:
for obj in allObjects:
    jdoc = {}
    jdoc["current_location"] = []
    level2val = level3val = ""
    irn = title = ""
    homepageId = ""
    
    for prop in obj["atom"]:         
        propName = prop["@name"]     
        if "#text" in prop:
            if propName == "irn":
                irn = prop["#text"]
                
    for tuple in obj["tuple"]:
        if tuple['@name'] == 'LocCurrentLocationRef':
            for atom in tuple["atom"]:
                if "#text" in atom:
                    if atom["@name"] == 'LocLevel2':
                        if atom["#text"] == 'see related parts':
                            see_related_parts = True
                        else:
                            see_related_parts = False
                    level2val = atom["#text"]
                    if atom["@name"] == "LocLevel3":
                        try:
                            level3val = atom["#text"]
                        except:
                            level3val = ""
        
    for tuple in obj["tuple"]:
        if tuple['@name'] == 'LocCurrentLocationRef':
            for atom in tuple["atom"]:
                if "#text" in atom:
                    if atom["@name"] == 'LocLevel1':    
                        if atom["#text"] == 'On Loan':
                            jdoc["current_location"].append({
                            "id": baseURI + "thesauri/location/on-loan",
                            "type": "Place",
                            "_label": "On Loan"})
                    
                        if 'Galler' in atom["#text"] or 'Suite' in atom["#text"]:
                            jdoc["current_location"].append({
                            "id": baseURI + "thesauri/location/" + level3val,
                            "type": "Place",
                            "_label": level2val,
                        "clasified_as": [{
                        "id": "http://vocab.getty.edu/aat/300240057",
                        "type": "Type",
                        "_label": "galleries (display spaces)"
                        }] })
            
                    if atom["@name"] == 'LocLevel2': 
                        if atom["#text"] == 'Efroymson Family Entrance':
                         jdoc["current_location"].append({ 
                         "id": baseURI + "thesauri/location/F02",
                        "type": "Place",
                        "_label": "Efroymson Family Entrance Pavilion"
                     })
                        elif atom["#text"] == 'Nature Park':
                             jdoc["current_location"].append({ 
                         "id": baseURI + "thesauri/location/ANP",
                        "type": "Place",
                        "_label": "Virginia B. Fairbanks Art &amp; Nature Park"})
                        
                        elif atom["#text"] == 'Grounds':
                            jdoc["current_location"].append({ 
                         "id": baseURI + "thesauri/location/G",
                        "type": "Place",
                        "_label": "Newfields Grounds"})
                    
                        elif atom["#text"] == 'Asian Visible Storage':
                             jdoc["current_location"].append({ 
                         "id": baseURI + "thesauri/location/K241",
                        "type": "Place",
                        "_label": "Leah and Charles Reddish Gallery - Asian Visible Storage",
                         "classified_as": [
                            {
                            "id": "http://vocab.getty.edu/aat/300240057",
                            "type": "Type",
                            "_label": "galleries (display spaces)"
                            }]
                     })
                        elif atom["#text"] == 'Westerley':
                            jdoc["current_location"].append({ 
                         "id": baseURI + "thesauri/location/westerley",
                        "type": "Place",
                        "_label": "Westerley"})    
                
                        else:
                            jdoc["current_location"].append({ 
                         "id": baseURI + "thesauri/location/storage",
                        "type": "Place",
                        "_label": "IMA Storage"})  

    all_linkedart[irn].update(jdoc)
        
widgeText("Linked Art", all_linkedart,'200px' )

Textarea(value='{\n  "1032": {\n    "@context": "https://linked.art/ns/v1/linked-art.json",\n    "id": "https:…

### Statements about a Resource - Linguistic Objects 

https://linked.art/model/base/#statements-about-a-resource
    
In many cases, current data does not support the level of specificity that the full ontology allows, or the information is simply best expressed in human-readable form. For example, instead of a completely modeled set of parts with materials, many museum collection management systems allow only a single human-readable string for the "medium" or "materials statement". The same is true in many other situations, including rights or allowable usage statements, dimensions, edition statements and so forth. Any time that there is a description of the resource, with or without qualification as to the type of description, then this pattern can be used to record the descriptive text.

The pattern makes use of the `LinguisticObject` class that is used to identify a particular piece of textual content. These Linguistic Objects are then refered to by any other resource. They maintain the statement's text in the content property, and the language of the statement (if known) in the language property.

Use cases for this pattern include:

- General description of the resource
- Materials statement for an object
- Attribution statement for an image
- Biography for a person
- Dimensions statement for a part of an object    

In [219]:
for obj in allObjects:
    jdoc = {}
    jdoc["referred_to_by"] = []
    irn = title = ""
    homepageId = ""
    
    for prop in obj["atom"]:         
        propName = prop["@name"]     
        if "#text" in prop:
            if propName == "irn":
                irn = prop["#text"]
            if propName == "TitAccessionNo":
                titAccessionNo = prop["#text"]


    for atom in obj["atom"]:
        if atom["@name"] == "SumCreditLine":
            val = atom["#text"]
            jdoc["referred_to_by"].append({
                 "id": baseURI + "object/" + irn + "/credit-line",
                "type": "LinguisticObject",
                "_label": "IMA at Newfields Credit Line for the Object",
                "content": val,
                "classified_as": [
                        {
                            "id": "http://vocab.getty.edu/aat/300026687",
                            "type": "Type",
                            "_label": "acknowledgments"
                        },
                        {
                            "id": "http://vocab.getty.edu/aat/300418049",
                            "type": "Type",
                            "_label": "brief texts"
                        }]
        })
        if atom["@name"] == 'CreProvenance':
            if "#text" in atom.keys():
                val = atom["#text"]
                jdoc["referred_to_by"].append({
                "id": baseURI + "object/" + irn + "/provenance-statement",
                    "type": "LinguisticObject",
                    "_label": "IMA Provenance Statement about the Object",
                    "content": val,
                    "classified_as": [
                        {
                            "id": "http://vocab.getty.edu/aat/300055863",
                            "type": "Type",
                            "_label": "provenance (history of ownership)"
                        },
                        {
                            "id": "http://vocab.getty.edu/aat/300418049",
                            "type": "Type",
                            "_label": "brief texts"
                        }
                    ]
            })
            
    all_linkedart[irn].update(jdoc)
        
widgeText("Linked Art", all_linkedart,'200px' )

Textarea(value='{\n  "1032": {\n    "@context": "https://linked.art/ns/v1/linked-art.json",\n    "id": "https:…

### View the final Linked Art JSON-LD representation for the selected object

The final Linked Art JSON-LD representation of the object is now available:

In [220]:
# iterate through file and produce json files for each object

for irn in all_linkedart:
    
    text_file = open("./data/ima/output/json/all/" + irn + ".json", "wt")
    n = text_file.write(json.dumps(all_linkedart[irn]))
    text_file.close()

f = open("./data/ima/output/json/all/allobjects_linkedart.json", "w")
f.write(json.dumps(jdoc, indent=2))
f.close() 

widgeText("Linked Art", all_linkedart ,'400px' )

Textarea(value='{\n  "1032": {\n    "@context": "https://linked.art/ns/v1/linked-art.json",\n    "id": "https:…

In [221]:
import os
from IPython.core.display import display, HTML


def fn():       # 1.Get file names from directory
    file_list=os.listdir(r"./data/ima/output/json/all/")
   
    for file in file_list:
        display(HTML("<a href='./data/ima/output/json/all/" + file +"'>" + file + "</a>"))
    

fn()

## View and Download the Object Linked Art JSON-LD File

The final JSON-LD file for the selected object is available at:
- [All Objects as Linked Art](allobjects_linkedart.json)

## Further Reading

Visit the [Linked Art website](https://linked.art) for further information on the data model and community activities