# Map Collection Data - CSV

This notebook shows how to read in collection data in CSV format and map it to a Python dictionary, ready for transformation to Linked Art.

<img width='600px'  src='media/map.png'>

Steps in this notebook:
1. Read CSV file
2. Map collection data to Python Dictionary


# Python modules


Three Python modules will be used:
* pandas
* json
* csv

## pandas
https://pandas.pydata.org/

<pre>"pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool,
built on top of the Python programming language."</pre>

`pandas` is used to read in the csv file contents to a pandas dataframe for further processing in Python. A dataframe is a data structure to hold <pre>"two-dimensional, size-mutable, potentially heterogeneous tabular data."</pre>


## json


The Python `json` module is used to encode and decode JSON objects and is used in the script to encode JSON objects before printing.


## csv


<pre>The csv module implements classes to read and write tabular data in CSV format.</pre>


### Further Reading 
* https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html

* https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html

* https://docs.python.org/3/library/json.html

* https://docs.python.org/3/library/csv.htm

In [1]:

# https://pandas.pydata.org/

try:
    import pandas as pd
except:
    %pip install pandas
    import pandas as pd


try:
    import json
except:
    %pip install json
    import json 
  
try:
    import csv
except:
    %pip install csv
    import csv


## Read CSV file contents using pandas

The following code demonstrates how to read CSV into a pandas dataframe.

In [2]:
file = './data/example/pma_ruskin.csv'
mpg = pd.read_csv(file,low_memory=False)

### Print header rows
Print (up to) the first 5 rows for illustration



In [3]:
mpg.head()

Unnamed: 0,Media,Object Number,Department,Classification,Culture,Period,Display Name,Object Name,Title,Dated,Medium,Dimensions,Description,Attribution,AttributionSort,Credit Line
0,0,64-1993-1,PDP,Drawings,,,John Ruskin,Drawings,"Study of a River Bank, Beauvais, France",1846,Watercolor,H: 125mm W: 175mm,,John Ruskin,Ruskin John,
1,1,01/07/1995,PDP,Drawings,,,John Ruskin,Drawings,Beanstalk,Date unknown,"Pen and yellow-brown ink and wash, graphite pe...",Sheet: 7 3/4 x 11 1/8inches (19.7 x 28.3cm),,John Ruskin,Ruskin John,Purchased with The Herbert & Nannette Rothschi...


## Read CSV file contents using csv

The following code demonstrates how to read in CSV file data using the `csv` module.


### Remove Byte Order Mark (BOM) from CSV file 

An initial step is shown, which is to remove Byte Order Marks (BOM) from the CSV file before loading it into a Python dictionary. See https://stackoverflow.com/questions/8898294/convert-utf-8-with-bom-to-utf-8-with-no-bom-in-python


In [4]:
# remove BOM

# open file 
s = open(file, mode='r', encoding='utf-8-sig').read()

# write to file 
open(file, mode='w', encoding='utf-8').write(s)


668

### Read CSV contents into Python dictionary

Use `csv.DictReader()` to create a Python dictionary whose keys are the field names. 


In [5]:
# open file and read into Python dictionary
allObjects = csv.DictReader(open(file, mode='r',encoding='utf-8'))

### Print first `row` in dictionary
Print first `row` in dictionary for illustration


In [6]:
for obj in allObjects:
    print(json.dumps(obj,indent=2))
    break  

{
  "Media": "0",
  "Object Number": "64-1993-1",
  "Department": "PDP",
  "Classification": "Drawings",
  "Culture": "",
  "Period": "",
  "Display Name": "John Ruskin",
  "Object Name": "Drawings",
  "Title": "Study of a River Bank, Beauvais, France",
  "Dated": "1846",
  "Medium": "Watercolor",
  "Dimensions": "H: 125mm  W: 175mm",
  "Description": "",
  "Attribution": "John Ruskin",
  "AttributionSort": "Ruskin John",
  "Credit Line": ""
}


# Map collection data to Linked Art data model

We will now map column headings in the collection data, to entities in the Linked Art data model, in preparation for transformation of the collection data to Linked Art.

The mapping is shown in the following cell, with a Python dictionary called `mapp`
* keys are strings that represent entities in the Linked Art data model
* values are mapped column headings from the collection data CSV file



In [7]:
mapp =  {
    "id":"Object Number",
    "accession_number":"Object Number",
    "accession_date": "accession_date",
    "classification" : "Classification",
    "title": "Title",
    "alt_title": "",
    "notes": "Description",
    "date_created":"Dated",
    "date_created_earliest": "",
    "date_created_latest": "",
    "created_period":"Period",
    "created_dynasty":"",
    "created_inscriptions":"",
    "created_notes": "Description",
    "creator":"Display Name",
    "physical_medium": "Medium",
    "physical_style": "",
    "physical_technique": "physical_technique",
    "physical_description": "physical_description",
    "physical_dimensions": "Dimensions",
    "created_provenance": "Attribution" ,
    "credit_line": "Credit Line",
    "collection" : "Department",
    "current_status" : "",
    "homepage": ""
}



# display transposed dataframe of data mapping
display(pd.DataFrame(mapp, index=[0]).T)



Unnamed: 0,0
id,Object Number
accession_number,Object Number
accession_date,accession_date
classification,Classification
title,Title
alt_title,
notes,Description
date_created,Dated
date_created_earliest,
date_created_latest,


## Write mapping to file

In [8]:
# write to file 
mapp_file = './data/example/mapping.json'
       
with open(mapp_file, 'w') as f:
    f.write(json.dumps(mapp,indent=2))

# Try with your own CSV file

If you'd like to try this with your own CSV file, select the file on your local system using the widget below.

The `ipywidgets` Python module will be used for the file upload.

### Further Reading

https://ipywidgets.readthedocs.io/

In [16]:
try:
    import ipywidgets as widgets
except:
    %pip install ipywidgets
    import ipywidgets as widgets
    
from ipywidgets import Layout, FileUpload 


try:
    import IPython
except:
    %pip install IPython
    import IPython 
from IPython.display import display, IFrame, HTML, Javascript
from IPython.core.display import HTML


import io


## Display file upload widget

In [17]:
# define file upload widget
uploader = widgets.FileUpload(accept='*.csv', multiple=False, description='Select CSV file')
uploader.style.button_color = 'orange'

display(uploader)



FileUpload(value={}, accept='*.csv', description='Select CSV file', style=ButtonStyle(button_color='orange'))

## Read contents of CSV file 

The following code reads the contents of the CSV file uploaded using the FileUpload widge and loads it into a `pandas` dataframe

### Display file contents

`dataframe.head()` displays the first 5 rows for illustration. Add a number in .head() to display more rows e.g. .head(20)

In [18]:
file_uploaded = False

for filename in uploader.value:   
    
    if filename != ""  :  
        file_uploaded = True
        content = uploader.value[filename]["content"]

        # read file content into pandas dataframe
        dataframe = pd.read_csv(io.BytesIO(content))
        
       
        display(dataframe.head())


Unnamed: 0,id,accession_number,share_license_status,tombstone,current_location,title,title_in_original_language,series,series_in_original_language,creation_date,...,digital_description,wall_description,external_resources,citations,catalogue_raisonne,url,image_web,image_print,image_full,updated_at
0,74570,2018.41,CC0,"Profile Portrait of a Man, 18th century. Attri...",,Profile Portrait of a Man,,,,18th century,...,This portrait bust is a counterproof: the reve...,,"{'wikidata': [], 'internet_archive': ['https:/...",,,https://clevelandart.org/art/2018.41,https://openaccess-cdn.clevelandart.org/2018.4...,https://openaccess-cdn.clevelandart.org/2018.4...,https://openaccess-cdn.clevelandart.org/2018.4...,2021-06-29 06:35:50.582000
1,74572,2018.42,CC0,"The Temptation of St. Anthony, 19th century. J...",,The Temptation of St. Anthony,,,,19th century,...,Boilly’s scene represents the Temptations of S...,,"{'wikidata': [], 'internet_archive': []}",,,https://clevelandart.org/art/2018.42,https://openaccess-cdn.clevelandart.org/2018.4...,https://openaccess-cdn.clevelandart.org/2018.4...,https://openaccess-cdn.clevelandart.org/2018.4...,2020-11-04 19:07:40.307000
2,74576,2015.446,CC0,"The Pipes of Pan, 1865. Mariano Fortuny (Spani...",,The Pipes of Pan,,,,1865,...,,,"{'wikidata': [], 'internet_archive': []}",,,https://clevelandart.org/art/2015.446,https://openaccess-cdn.clevelandart.org/2015.4...,https://openaccess-cdn.clevelandart.org/2015.4...,https://openaccess-cdn.clevelandart.org/2015.4...,2021-01-05 10:00:05.748000
3,74580,2018.43,CC0,"Costume Study, before 1852. Henri Lehmann (Fre...",,Costume Study,,,,before 1852,...,Lehmann’s drawing belongs to a pair in which t...,,"{'wikidata': [], 'internet_archive': []}","Lehmann, Henri, and Marie-Madeleine Aubrun. He...",,https://clevelandart.org/art/2018.43,https://openaccess-cdn.clevelandart.org/2018.4...,https://openaccess-cdn.clevelandart.org/2018.4...,https://openaccess-cdn.clevelandart.org/2018.4...,2021-03-27 12:12:37.772000
4,74581,2018.44,CC0,"Nude Study of an Old Man, before 1852. Henri L...",,Nude Study of an Old Man,,,,before 1852,...,In preparation for executing largescale painti...,,"{'wikidata': [], 'internet_archive': []}","Lehmann, Henri, and Marie-Madeleine Aubrun. He...",,https://clevelandart.org/art/2018.44,https://openaccess-cdn.clevelandart.org/2018.4...,https://openaccess-cdn.clevelandart.org/2018.4...,https://openaccess-cdn.clevelandart.org/2018.4...,2021-03-27 12:12:37.779000


In [19]:
if file_uploaded == False:
    display(HTML("<h1 style='color:orange'>Please upload a CSV file to continue</h1>"))
    
else: 
    # get column headings 

    columns = list(dataframe.columns.values.tolist())

    columns.insert(0, "")

    print(columns)

['', 'id', 'accession_number', 'share_license_status', 'tombstone', 'current_location', 'title', 'title_in_original_language', 'series', 'series_in_original_language', 'creation_date', 'creation_date_earliest', 'creation_date_latest', 'creators', 'culture', 'technique', 'support_materials', 'department', 'collection', 'type', 'measurements', 'state_of_the_work', 'edition_of_the_work', 'creditline', 'copyright', 'inscriptions', 'exhibitions', 'provenance', 'find_spot', 'related_works', 'former_accession_numbers', 'fun_fact', 'digital_description', 'wall_description', 'external_resources', 'citations', 'catalogue_raisonne', 'url', 'image_web', 'image_print', 'image_full', 'updated_at']


### Map collection data to Linked Art data model

In [20]:

   
# strings representing Linked Art entities
la_entities = ["","id", "accession_number", "accession_date",  "classification" ,  "title",  "alt_title",  
             "notes",  "date_created", "date_created_earliest",  "date_created_latest",  
             "created_period", "created_dynasty", "created_inscriptions", "created_notes",  
             "creator", "physical_medium",  "physical_style",  "physical_technique",  "physical_description",  
             "physical_dimensions",  "created_provenance",  "credit_line",  "collection" ,  "current_status" ,  
             "homepage"   ]

# dictionary to hold mapping
mapp = {}

# function to act on change event
def on_change(change):
    if change['type'] == 'change' and change['name'] == 'value':
        
        if la_dropdown.value != "": 
            mapp[la_dropdown.value] = input_dropdown.value
        
            print(json.dumps(mapp, indent=2))

# only proceed if file uploaded
if file_uploaded == True:
    
     # create dropdown list from column names

    column_select = widgets.Dropdown(options=columns)
    column_select.options = columns
    
    input_dropdown = widgets.Dropdown(options=columns,description='Input Data')
    la_dropdown = widgets.Dropdown(options=la_entities,description="Linked Art")

    display(la_dropdown)
    display(input_dropdown)


    #input_dropdown.observe(on_change)
    input_dropdown.observe(on_change)

    

Dropdown(description='Linked Art', options=('', 'id', 'accession_number', 'accession_date', 'classification', …

Dropdown(description='Input Data', options=('', 'id', 'accession_number', 'share_license_status', 'tombstone',…

{
  "accession_number": "accession_number"
}
{
  "accession_number": "accession_number",
  "accession_date": ""
}
{
  "accession_number": "accession_number",
  "accession_date": "",
  "title": "title"
}
{
  "accession_number": "accession_number",
  "accession_date": "",
  "title": "title",
  "date_created_earliest": "creation_date_earliest"
}
{
  "accession_number": "accession_number",
  "accession_date": "",
  "title": "title",
  "date_created_earliest": "creation_date_earliest",
  "date_created_latest": "creation_date_latest"
}
{
  "accession_number": "accession_number",
  "accession_date": "",
  "title": "title",
  "date_created_earliest": "creation_date_earliest",
  "date_created_latest": "creation_date_latest",
  "creator": "creators"
}
{
  "accession_number": "accession_number",
  "accession_date": "",
  "title": "title",
  "date_created_earliest": "creation_date_earliest",
  "date_created_latest": "creation_date_latest",
  "creator": "creators",
  "homepage": "url"
}
{
  "access

### Display mapping

In [23]:
display(pd.DataFrame(mapp, index=[0]).T)


Unnamed: 0,0
accession_number,accession_number
accession_date,
title,title
date_created_earliest,creation_date_earliest
date_created_latest,creation_date_latest
creator,creators
homepage,url
id,id


### Write mapping to file

In [24]:
# save mapping to file for reuse later

# write to CSV file 

mapp_file = './data/working/mapping.json'
       
with open(mapp_file, 'w') as f:
        f.write(json.dumps(mapp, indent=2))

print(json.dumps(mapp, indent=2))

{
  "accession_number": "accession_number",
  "accession_date": "",
  "title": "title",
  "date_created_earliest": "creation_date_earliest",
  "date_created_latest": "creation_date_latest",
  "creator": "creators",
  "homepage": "url",
  "id": "id"
}


## Summary

In this workbook you have been shown how collection data in CSV can be mapped to entities in the Linked Art data model, prior to transformation of the collection data. 

You have been able to create your own mapping file with your own CSV collection data and save it for use in the transformation notebook.