# Extract Collection Data - CSV

This notebook shows how to read in collection data in CSV format and convert it to a Python dictionary, ready for transformation to Linked Art

Steps in this notebook:
1. Read CSV file
2. Convert CSV file to Python Dictionary




# Python modules


Three Python modules will be used:
* pandas
* json
* csv

## pandas
https://pandas.pydata.org/

<pre>"pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool,
built on top of the Python programming language."</pre>

`pandas` is used to read in the csv file contents to a pandas dataframe for further processing in Python. A dataframe is a data structure to hold <pre>"two-dimensional, size-mutable, potentially heterogeneous tabular data."</pre>


## json


The Python `json` module is used to encode and decode JSON objects and is used in the script to encode JSON objects before printing.


## csv


<pre>The csv module implements classes to read and write tabular data in CSV format.</pre>


### Further Reading 
* https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html

* https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html

* https://docs.python.org/3/library/json.html

* https://docs.python.org/3/library/csv.htm

In [1]:

# https://pandas.pydata.org/

try:
    import pandas as pd
except:
    %pip install pandas
    import pandas as pd


try:
    import json
except:
    %pip install json
    import json 
  
try:
    import csv
except:
    %pip install csv
    import csv


## Read CSV file contents using pandas

The following code demonstrates how to read CSV into a pandas dataframe

In [2]:
file = '../data/cma/input/data.csv'
mpg = pd.read_csv(file,low_memory=False)

### Print first 5 rows
Print the first 5 rows for illustration



In [3]:
mpg.head()

Unnamed: 0,id,accession_number,share_license_status,tombstone,current_location,title,title_in_original_language,series,series_in_original_language,creation_date,...,digital_description,wall_description,external_resources,citations,catalogue_raisonne,url,image_web,image_print,image_full,updated_at
0,74539,2015.449,CC0,"A Miller's Carriage, c. 1895. Albert-Charles L...",,A Miller's Carriage,,,,c. 1895,...,,,"{'wikidata': [], 'internet_archive': ['https:/...",,,https://clevelandart.org/art/2015.449,https://openaccess-cdn.clevelandart.org/2015.4...,https://openaccess-cdn.clevelandart.org/2015.4...,https://openaccess-cdn.clevelandart.org/2015.4...,2021-06-29 06:35:50.572000
1,74540,2015.451,CC0,"Leda and the Swan. Adolphe Yvon (French, 1817-...",,Leda and the Swan,,,,,...,In the late 1520s Michelangelo made a painting...,,"{'wikidata': [], 'internet_archive': []}",,,https://clevelandart.org/art/2015.451,https://openaccess-cdn.clevelandart.org/2015.4...,https://openaccess-cdn.clevelandart.org/2015.4...,https://openaccess-cdn.clevelandart.org/2015.4...,2020-11-04 19:07:39.161000
2,74554,2015.447,CC0,"Un Borreau (An Executioner), c. 1848. Auguste ...",,Un Borreau (An Executioner),,,,c. 1848,...,,,"{'wikidata': [], 'internet_archive': []}",,,https://clevelandart.org/art/2015.447,https://openaccess-cdn.clevelandart.org/2015.4...,https://openaccess-cdn.clevelandart.org/2015.4...,https://openaccess-cdn.clevelandart.org/2015.4...,2021-03-27 12:12:37.752000
3,74570,2018.41,CC0,"Profile Portrait of a Man, 18th century. Attri...",,Profile Portrait of a Man,,,,18th century,...,This portrait bust is a counterproof: the reve...,,"{'wikidata': [], 'internet_archive': ['https:/...",,,https://clevelandart.org/art/2018.41,https://openaccess-cdn.clevelandart.org/2018.4...,https://openaccess-cdn.clevelandart.org/2018.4...,https://openaccess-cdn.clevelandart.org/2018.4...,2021-06-29 06:35:50.582000
4,74572,2018.42,CC0,"The Temptation of St. Anthony, 19th century. J...",,The Temptation of St. Anthony,,,,19th century,...,Boilly’s scene represents the Temptations of S...,,"{'wikidata': [], 'internet_archive': []}",,,https://clevelandart.org/art/2018.42,https://openaccess-cdn.clevelandart.org/2018.4...,https://openaccess-cdn.clevelandart.org/2018.4...,https://openaccess-cdn.clevelandart.org/2018.4...,2020-11-04 19:07:40.307000


## Read CSV file contents using csv

The following code demonstrates how to read in CSV file data using the `csv` module.


### Remove Byte Order Mark (BOM) from CSV file 

An initial step is shown, which is to remove Byte Order Marks (BOM) from the CSV file before loading it into a Python dictionary. See https://stackoverflow.com/questions/8898294/convert-utf-8-with-bom-to-utf-8-with-no-bom-in-python


In [4]:
# remove BOM

# open file 
s = open(file, mode='r', encoding='utf-8-sig').read()


# write to file 
open(file, mode='w', encoding='utf-8').write(s)


80557121

### Read CSV contents into Python dictionary

Use `csv.DictReader()` to create a Python dictionary whose keys are the field names. 


In [5]:
# open file and read into Python dictionary
allObjects = csv.DictReader(open(file, mode='r',encoding='utf-8'))

### Print first `row` in dictionary
Print first `row` in dictionary for illustration


In [6]:
for obj in allObjects:
    print(json.dumps(obj,indent=2))
    break  

{
  "id": "74539",
  "accession_number": "2015.449",
  "share_license_status": "CC0",
  "tombstone": "A Miller's Carriage, c. 1895. Albert-Charles Lebourg (French, 1849-1928). Black and white chalk with stumping ; sheet: 33.2 x 49.7 cm (13 1/16 x 19 9/16 in.). The Cleveland Museum of Art, Bequest of Muriel Butkin 2015.449",
  "current_location": "",
  "title": "A Miller's Carriage",
  "title_in_original_language": "",
  "series": "",
  "series_in_original_language": "",
  "creation_date": "c. 1895",
  "creation_date_earliest": "1890",
  "creation_date_latest": "1900",
  "creators": "Albert-Charles Lebourg (French, 1849-1928), artist",
  "culture": "France, 19th-20th century",
  "technique": "Black and white chalk with stumping ",
  "support_materials": "gray laid paper",
  "department": "Drawings",
  "collection": "DR - French",
  "type": "Drawing",
  "measurements": "Sheet: 33.2 x 49.7 cm (13 1/16 x 19 9/16 in.)",
  "state_of_the_work": "",
  "edition_of_the_work": "",
  "creditline":

# Try with your own CSV file

If you'd like to try this with your own CSV file, select the file on your local system using the widget below.

The `ipywidgets` Python module will be used for the file upload.

### Further Reading

https://ipywidgets.readthedocs.io/

In [7]:
try:
    import ipywidgets as widgets
except:
    !pip install ipywidgets
    import ipywidgets as widgets
    
from ipywidgets import Layout, FileUpload 


try:
    import IPython
except:
    %pip install IPython
    import IPython 
from IPython.display import display, IFrame, HTML, Javascript
from IPython.core.display import HTML


import io


## Display file upload widget

In [14]:
# define file upload widget
uploader = widgets.FileUpload(accept='*.csv', multiple=False, description='Select CSV file')
uploader.style.button_color = 'orange'

display(uploader)



FileUpload(value={}, accept='*.csv', description='Select CSV file', style=ButtonStyle(button_color='orange'))

## Read contents of CSV file 

The following code reads the contents of the CSV file uploaded using the FileUpload widge and loads it into a `pandas` dataframe

### Display file contents

`dataframe.head()` displays the first 5 rows for illustration. Add a number in .head() to display more rows e.g. .head(20)

In [17]:
for filename in uploader.value:       
    if filename != ""  :  
        content = uploader.value[filename]["content"]

        # read file content into pandas dataframe
        dataframe = pd.read_csv(io.BytesIO(content))
        
       
        display(dataframe.head(50))


Unnamed: 0,Media,Object Number,Department,Classification,Culture,Period,Display Name,Object Name,Title,Dated,...,Unnamed: 16370,Unnamed: 16371,Unnamed: 16372,Unnamed: 16373,Unnamed: 16374,Unnamed: 16375,Unnamed: 16376,Unnamed: 16377,Unnamed: 16378,Unnamed: 16379
0,0,64-1993-1,PDP,Drawings,,,John Ruskin,Drawings,"Study of a River Bank, Beauvais, France",1846,...,,,,,,,,,,
1,1,1995-7-1,PDP,Drawings,,,John Ruskin,Drawings,Beanstalk,Date unknown,...,,,,,,,,,,
2,0,9-1982-1,AA,Paintings,,,Georgia O'Keeffe,Paintings,Mask with Golden Apple,1921,...,,,,,,,,,,
3,0,48-1993-1,AA,Paintings,,,Georgia O'Keeffe,Paintings,African Mask and Apple,,...,,,,,,,,,,
4,0,250-2003-1,AA,Paintings,,,Georgia O'Keeffe,Paintings,"Red Hills with Pedernal, White Clouds",1936,...,,,,,,,,,,
5,0,250-2003-1 frame,AA,Frames,,,Georgia O'Keeffe,Frames,"Frame for: Red Hills with Pedernal, White Clouds",1936,...,,,,,,,,,,
6,0,250-2003-2,AA,Paintings,,,Georgia O'Keeffe,Paintings,"The Barns, Lake George",1926,...,,,,,,,,,,
7,1,1942-12-1,AA,Paintings,,,Georgia O'Keeffe,Paintings,White Petunias with Salvia,,...,,,,,,,,,,
8,1,1944-95-4,AA,Paintings,,,Georgia O'Keeffe,Paintings,Peach and Glass,1927,...,,,,,,,,,,
9,3,1949-18-109,AA,Paintings,,,Georgia O'Keeffe,Paintings,Red Hills and Bones,1941,...,,,,,,,,,,
