# Extract Collection Data - XML

This notebook shows how to read in collection data in XML format and convert it to a Python dictionary, ready for transformation to Linked Art.

Steps in this notebook:
1. Read XML file
2. Convert XML file to Python Dictionary


# Python modules


The following Python modules will be used:
* json
* xmldict


## json


The Python `json` module is used to encode and decode JSON objects and is used in the script to encode JSON objects before printing.


## xmldict

The `xmldict` module converts xml to Python dictionaries, and vice-versa.


### Further Reading 

* https://docs.python.org/3/library/json.html

* https://github.com/thoughtnirvana/xmldict

In [1]:
try:
    import json
except:
    %pip install json
    import json
    
  
try:
    import xmltodict
except:
    !pip install xmltodict
    import xmltodict 


## Read XML file into Python dictionary using xmltodict

The following code demonstrates how to read a XML file and put contents into a Python dictionary using `xmldict`.

For demonstration purposes, it displays the first entry in the Python dictionary.

In [8]:
file = './data/example/ima.xml'

# open file
with open(file) as fd:
    content = fd.read()
    obj = xmltodict.parse(content) 

    allObjects = obj["table"]["tuple"]
    
    # iterate over all objects
    for obj in allObjects:
        # print a single object
        print(json.dumps(obj,indent=2))
        break



{
  "atom": [
    {
      "@name": "irn",
      "@type": "text",
      "@size": "short",
      "#text": "1032"
    },
    {
      "@name": "AdmPublishWebNoPassword",
      "@type": "text",
      "@size": "short",
      "#text": "Yes"
    },
    {
      "@name": "TitAccessionNo",
      "@type": "text",
      "@size": "short",
      "#text": "60.63"
    },
    {
      "@name": "TitPreviousAccessionNo",
      "@type": "text",
      "@size": "short",
      "#text": "TR5488/1"
    },
    {
      "@name": "TitObjectStatus",
      "@type": "text",
      "@size": "short",
      "#text": "Accessioned"
    },
    {
      "@name": "TitAccessionDate",
      "@type": "text",
      "@size": "short",
      "#text": "1960-10-10"
    },
    {
      "@name": "TitMainTitle",
      "@type": "text",
      "@size": "short",
      "#text": "long-neck vase with cup mouth"
    },
    {
      "@name": "TitSeriesTitle",
      "@type": "text",
      "@size": "short"
    },
    {
      "@name": "TitCollectionTitle

# Try with your own XML file

If you'd like to try this with your own XML file, select the file on your local system using the widget below.

The `ipywidgets` Python module will be used for the file upload.

### Further Reading

https://ipywidgets.readthedocs.io/

In [3]:
try:
    import ipywidgets as widgets
except:
    !pip install ipywidgets
    import ipywidgets as widgets
    
from ipywidgets import Layout, FileUpload 


try:
    import IPython
except:
    %pip install IPython
    import IPython 
from IPython.display import display, IFrame, HTML, Javascript
from IPython.core.display import HTML


import io


## Display file upload widget

In [4]:
# define file upload widget
uploader = widgets.FileUpload(accept='*.xml', multiple=False, description='Select XML file')
uploader.style.button_color = 'orange'

display(uploader)



FileUpload(value={}, accept='*.xml', description='Select XML file', style=ButtonStyle(button_color='orange'))

## Read contents of XML file 

The following code reads the contents of the XML file uploaded using the FileUpload widget and loads it into a Python dictionary.

### Display file contents



In [6]:
for filename in uploader.value:       
    if filename != ""  :  
        content = uploader.value[filename]["content"]
        
        obj = xmltodict.parse(content) 

        allObjects = obj["table"]["tuple"]
    
        # iterate over all objects
        for obj in allObjects:
            # print a single object
            print(json.dumps(obj,indent=2))
            break

        

{
  "atom": [
    {
      "@name": "irn",
      "@type": "text",
      "@size": "short",
      "#text": "1032"
    },
    {
      "@name": "AdmPublishWebNoPassword",
      "@type": "text",
      "@size": "short",
      "#text": "Yes"
    },
    {
      "@name": "TitAccessionNo",
      "@type": "text",
      "@size": "short",
      "#text": "60.63"
    },
    {
      "@name": "TitPreviousAccessionNo",
      "@type": "text",
      "@size": "short",
      "#text": "TR5488/1"
    },
    {
      "@name": "TitObjectStatus",
      "@type": "text",
      "@size": "short",
      "#text": "Accessioned"
    },
    {
      "@name": "TitAccessionDate",
      "@type": "text",
      "@size": "short",
      "#text": "1960-10-10"
    },
    {
      "@name": "TitMainTitle",
      "@type": "text",
      "@size": "short",
      "#text": "long-neck vase with cup mouth"
    },
    {
      "@name": "TitSeriesTitle",
      "@type": "text",
      "@size": "short"
    },
    {
      "@name": "TitCollectionTitle