## Welcome!

[OPenn](https://openn.library.upenn.edu/) contains complete sets of high-resolution archival images of manuscripts from the University of Pennsylvania Libraries and other institutions, along with machine-readable TEI P5 descriptions and technical metadata. All materials on this site are in the public domain or released under Creative Commons licenses as Free Cultural Works.

In this notebook we'll have a preliminary look at data and images harvested from OPenn. What kind of data, images, and files can we access in OPenn? We'll introduce the differences between administrative and descriptive metadata, how to calculate basic shape/stats of the data, as well as how to access and download images and data in OPenn. Other notebooks will explore working with OPenn [data]() and [images]().

* [Import What We Need](#Import-What-We-Need)
* [Load the Data](#Load-the-Data)
* [Review the Data](#Review-the-Data)
* [Access Images and Metadata of a Manuscript](#Access-Images-and-Metadata-of-a-Manuscript)
* [Access Metadata for Manuscripts in a Collection](#Access-Metadata-for-Manuscripts-in-a-Collection)
* [Need Help?](#Need-Help?)
* [Credits](#Credits)

<div class="alert alert-block alert-warning">
<p><b>Yellow blocks like this provide additional information about Python and Jupyter notebooks.</b></p>
    
<p>If you haven't used one of these notebooks before, they're basically web pages in which you can write, edit, and run live code. They're meant to encourage experimentation, so don't feel nervous. Just try running a few cells and see what happens!</p>

<p>
    Some tips:
    <ul>
        <li>Code cells have boxes around them.</li>
        <li>To run a code cell click on the cell and then hit <b>Shift+Enter</b>. The <b>Shift+Enter</b> combo will also move you to the next cell, so it's a quick way to work through the notebook.</li>
        <li>While a cell is running a <b>*</b> appears in the square brackets next to the cell. Once the cell has finished running the asterix will be replaced with a number.</li>
        <li>In most cases you'll want to start from the top of notebook and work your way down running each cell in turn. Later cells might depend on the results of earlier ones.</li>
        <li>To edit a code cell, just click on it and type stuff. Remember to run the cell once you've finished editing.</li>
    </ul>
</p>

<p><b>Is this thing on?</b> If you can't edit or run any of the code cells, you might be viewing a static (read only) version of this notebook. Click here to <a href="https://mybinder.org/v2/gh/GLAM-Workbench/national-museum-australia/master?urlpath=lab%2Ftree%2Fexplore_collection_object_over_time.ipynb">load a <b>live</b> version</a> running on Binder.</p>
</div>

## Import What We Need
<div>
    <p>In order to use this notebook, you first need to `import` modules and packages from Python.</p>
<div class="alert alert-block alert-warning">
<p>These modules and packages are units of code with specific tools or skills that we use in the script. If you're running this notebook on your computer, you may need to first `import` these modules within your Python interpreter. Find assistance for that <a href="https://packaging.python.org/tutorials/installing-packages/">here</a>.</p>
    </div>

In [12]:
# PIP is a package manager for Python packages. This command installs the list of libraries contained in the `requirements.txt` file.
!pip install -r requirements.txt

# Requests is a Python library for sending HTTP requests
import requests

# Pandas is a Python package that provides numerous tools for data analysis
import pandas as pd 

# Beautiful Soup is a Python library for pulling data out of HTML and XML files
from bs4 import BeautifulSoup

# OS is a Python module for interacting with the operating system
import os 





## Load the Data

All OPENN data is available at [https://openn.library.upenn.edu/Data/](https://openn.library.upenn.edu/Data/). OPenn Data is grouped into **curated collections** or by the **repository** from which it originates. 

In this notebook we will only work with records from the **Bibliotheca Philadelphiensis** project, referred to as **BiblioPhilly**. This project, pursued by the Philadelphia Area Consortium of Special Collections Libraries, contains digital editions of more than 400 western European medieval and early modern codices, plus selected leaves and cuttings from the following PACSCL member institutions.

We'll download the CSV for the BiblioPhilly collection, which contains information about the contents. 

In [13]:
# Convert to a dataframe
collections_contents = pd.read_csv("https://openn.library.upenn.edu/Data/bibliophilly_contents.csv")

# Return the first five rows
collections_contents.head()

Unnamed: 0,curated_collection,document_id,path,repository_id,metadata_type,title,added,document_created,document_updated
0,bibliophilly,4221,0023/lewis_e_018,23,TEI,Liber de vinis,2017-05-10T14:47:02+00:00,2017-05-10T14:27:01+00:00,2018-08-17T19:07:56+00:00
1,bibliophilly,4222,0023/lewis_e_057,23,TEI,Carmen in honorem Beatae Mariae Virginis,2017-05-10T20:19:36+00:00,2017-05-10T18:22:19+00:00,2018-08-17T19:15:35+00:00
2,bibliophilly,4223,0023/lewis_e_083,23,TEI,Historia belli civilis inter Caesarem et Pompeium,2017-05-10T20:19:42+00:00,2017-05-10T18:52:40+00:00,2018-08-17T19:18:33+00:00
3,bibliophilly,4225,0023/lewis_e_009,23,TEI,Processional; Astronomical Text binding fragment,2017-05-10T20:19:47+00:00,2017-05-10T19:10:48+00:00,2018-08-17T19:07:03+00:00
4,bibliophilly,4226,0023/lewis_e_003,23,TEI,Canon super almanach; De 12 signis et eorum na...,2017-05-10T20:19:52+00:00,2017-05-10T19:24:04+00:00,2018-08-17T19:05:44+00:00


## Review the Data

Now that we have this dataset in a dataframe, we can manipulate it.

This dataset contains administrative metadata about the manuscripts in the collection, which provides information to help manage the digital object, such as when and how it was created, file type and other technical information. What administrative metadata fields are in this dataframe?

In [14]:
# Print the number of rows in the dataframe
print('There are {:,} items in this collection from OPENN.'.format(collections_contents.shape[0]))

# Retrieve the column names and add it to list
collections_contents.columns.to_list()

There are 3,534 items in this collection from OPENN.


['curated_collection',
 'document_id',
 'path',
 'repository_id',
 'metadata_type',
 'title',
 'added',
 'document_created',
 'document_updated']

There are **nine** columns in this dataset: 
* **'curated_collection'** includes the name of the collection(s) with which the manuscript is associated. (In this case, every manuscript will be part of the 'bibliophilly' collection.) 
* **document_id'** includes the unique number for the manuscript within OPenn.
* **'path'** includes the file path for the metadata and images of the manuscript. 
* **'repository_id'** includes the number referring to the mansucript's holding institution. (Repositories and their corresponding ID numbers are listed at [https://openn.library.upenn.edu/Repositories.html](https://openn.library.upenn.edu/Repositories.html).
* **'metadata_type'** includes the structure of the file in which the metadata for a manuscript is contained. (In this case, every item will be 'TEI' - more on that later.) 
* **'title'** includes a name of the manuscript 
* **'added'** includes the date the manuscript metadata was first added to OPenn.
* **'document_created'** includes the date the manuscript metadata was first created on OPenn.
* **'document_updated'** includes the date the manuscript metadata was last updated.

## Access the Data and Images of a Manuscript 

Data on OPENN can be accessed in a number of ways, including the HTTP web site. Now that we have some information about how to access the metadata and images files, we'll use **requests** to download metadata and images for the first manuscript in this dataset. 

In [15]:
# Select the first row in the dataframe and get the value of the `path` column
path = collections_contents.iloc[0]['path']

# Split the `path` string on the "/", and save the two halves as separate values
repo_num = path.split("/")[0]
ms_name = path.split("/")[1]

# Return the current working directory for where to save the downloaded files
directory_path = os.getcwd()
try: # Try to make a directory for saving all OPENN data
    os.mkdir(f'{directory_path}/data')
except:
    pass
try: # Try to make a directory for saving the files for any manuscripts from this repository
    os.mkdir(f'{directory_path}/data/{repo_num}')
except:
    pass

# Use this as the full path to save the data in the correct repository and manuscript folder 
full_path = f'{directory_path}/data/{repo_num}/{ms_name}'

# Send a GET request to the URL for the TEI file for a particular manuscript and save it to this folder. 
with open(f'{full_path}/{ms_name}_TEI.xml', 'wb') as f:
    f.write(requests.get(f'http://openn.library.upenn.edu/Data/{repo_num}/{ms_name}/data/{ms_name}_TEI.xml').content)
print('Getting TEI...')

# Open the newly-saved TEI file for the manuscript and save the content as variable `soup` for parsing
with open(f'{full_path}/{ms_name}_TEI.xml') as f:
    soup = BeautifulSoup(f, 'xml')  # specify 'xml' so it parses data correctly

# Returns a list of all the `graphic` tags in the XML fle
    img_details = soup.find_all('graphic')
    # Make an empty list called img_names in which to store the image URLs
    img_names = []
    # For each `graphic` tag 
    for img in img_details:
        # If the URL attribute contains '_web.jpg'
        if '_web.jpg' in img.get('url'):
            # Append the URL attribute value to the img_names list, except for the first 3 characters
            img_names.append(img.get('url')[4:])
    print('Getting images...')

    # For each URL attribute value 
    for img in img_names:
        # Open an image file at the full manuscript path
        with open(f'{full_path}/{img}', 'wb') as f:
            # Send a GET request to the URL for the JPG file for a particular manuscript and save it to the img folder 
            f.write(requests.get(f'http://openn.library.upenn.edu/Data/{repo_num}/{ms_name}/data/web/{img}').content)
print('Images saved.')

Getting TEI...
Getting images...
Images saved.


In [16]:
#import shutil
#shutil.rmtree(f'{full_path}')

Each manuscript in OPenn is encoded as a TEI file to describe characteristics of the manuscript as machine-readable text. TEI is used to organize text into a strict "document tree". The entire document is considered the "root element", with other features, such as sections, chapters, pages, paragraphs, titles, etc., branching off of the root. It is this strict tree structure that makes it possible to reliably search a TEI document and to apply stylesheets for display to the user. (Read more about this at [https://cdrh.unl.edu/articles/basicguide/TEI](https://cdrh.unl.edu/articles/basicguide/TEI).

TEI may be customized to fit the needs of the project. Many tags are universal on OPenn, but we know these are consistently used across the BiblioPhilly collection. Some of these tags appear multiple times:  
* <summary> includes a brief summary of the salient features of a manuscript’s textual, material and artistic contents.
* <author> includes the Authority name for an author of the manuscript.
* <persName> includes the Authority name for a former owner of the manuscript.
* <supportDesc> includes the matterial on which the manuscript is written. 
* <extent> includes information about how the leaves and pages of the manuscript are numbered
* <scriptNote> includes information about the type of lettering used in the manuscript. 
* <origDate> includes a narrative date range for the manuscript's origin. (This tag may include additional attributes 'notBefore', 'when', and 'notAfter'.)   
* <origPlace> includes a narrative geographical location for the manuscript's origin. 
* <keywords> includes A defined list of keywords used in the BiblioPhilly project are listed [here](https://docs.google.com/spreadsheets/d/1U6Xk39Pr3UYvbpUjN2SC6rJjJ26Y5F-QMVX_Dq0SENM/edit#gid=1871258630)
    
Now that we know which tags we can look for, let's extract this data. We may do this in a few ways:
* For tags that will only appear once, we can access the text part of the named tag with **.get_text()**
* For attributes of a named tag, we can access the text part of the named attribute by **.find('tag')['attribute]**
* For tags that will appear more than once, we can access the text of each tag/attribute and store all in a list

In [17]:
# Access the text part of the named tag
summary = soup.summary.get_text()
extent = soup.extent.get_text()
scriptNote = soup.scriptNote.get_text()
origPlace = soup.origPlace.get_text()

# Access the text part of the named attribute of the named tag
supportDesc = soup.find('supportDesc')['material']
origDate_start = soup.find('origDate')['notBefore']
origDate_end = soup.find('origDate')['notAfter']

# Make an empty list called `keywords` in which to store the keywords
keywords = []
# Find all uses of the tag `term`
all_keywords = soup.find_all('term')
# For each `term` in the list:
for word in all_keywords: 
    # Access the text part of the `term` tag
    keyword = word.get_text()
    # Append the text to the `keywords` list
    keywords.append(keyword)

# Make an empty list called `owners` in which to store the names
owners = []
# Find all uses of the tag `persName` within the tag `msItem`
all_owners = soup.find('msItem').find_all('persName')
# For each `persName` in the list:
for persName in all_owners: 
    # Access the text part of the `persName` tag
    owner = persName.get_text()
    # Append the text to the `owners` list
    owners.append(owner)

try: # Not all manuscripts have known authors, so try this first: 
    # Make an empty list called `authors` in which to store the authors
    authors = []
    # Find all uses of the tag `author`
    all_authors = soup.find_all('author')
    # For each `author` in the list:
    for auth in all_authors: 
        # Access the text part of the `author` tag
        author = auth.get_text()
        # Append the text to the `authors` list
        authors.append(author)
except: # If it doesn't have an author, assign an empty string as the value of `authors`
    authors = ''

# Print the values of each variable
print('Summary: ', summary)
print('Author(s): ', authors)
print('Owner(s): ', owners)
print('Support Desc: ', supportDesc)
print('Extent: ', extent)
print('Script Note: ', scriptNote)
print('OrigDate_Start: ', origDate_start)
print('OrigDate_End: ', origDate_end)
print('OrigPlace: ', origPlace)
print('Keywords: ', keywords)

Summary:  This manuscript is an early 15th-century German translation of Villanova's "Liber de vinis," a medical treatise on the uses of wine.
Author(s):  ['Arnaldus de Villanova']
Owner(s):  ['Quaritch, Bernard', 'Lewis, John Frederick, 1860-1932', 'Lewis, Anne Baker']
Support Desc:  paper
Extent:  ii+24+i; 187 x 135 mm 
Script Note:  Gothic--cursiva
OrigDate_Start:  1400
OrigDate_End:  1415
OrigPlace:  Germany
Keywords:  ['15th century', 'German', 'Germany', 'Science -- Medicine', 'Gothic', 'Paper', 'Treatise', 'Gloss']


## Access Metadata for Manuscripts in a Collection

Now that we know how to do this for one manuscript, we'll repeat the process with all the manuscripts and save the descriptive metadata alongside the administrative metadata in the **bibliophilly** dataframe. 

*Note: This script does NOT download all the images, but the code to do so for all manuscript items is commented out below.*

In [18]:
# Create a list of column names to add to the dataframe
new_columns = ['Summary','Author(s)','Owner(s)','Support Desc','Extent','Script Note','OrigDate_When','OrigDate_Start','OrigDate_End','OrigPlace','Keywords','ImgNames']
# For each column name in the list: 
for title in new_columns:
    # Add the column to the dataframe with a blank string as the value for each row
    collections_contents[title] = '' 

In [19]:
# Print the column names in the dataframe as a list
print(collections_contents.columns.values)

['curated_collection' 'document_id' 'path' 'repository_id' 'metadata_type'
 'title' 'added' 'document_created' 'document_updated' 'Summary'
 'Author(s)' 'Owner(s)' 'Support Desc' 'Extent' 'Script Note'
 'OrigDate_When' 'OrigDate_Start' 'OrigDate_End' 'OrigPlace' 'Keywords'
 'ImgNames']


In [20]:
# Iterate over each index/row in the dataframe 
for idx, row in collections_contents.iterrows():
    print(path)
    path = collections_contents.iloc[idx]['path']
    repo_num = path.split("/")[0]
    ms_name = path.split("/")[1]
    directory_path = os.getcwd()
    full_path = f'{directory_path}/data/{repo_num}/{ms_name}'
    
    try:
        os.mkdir(f'{directory_path}/data')
    except:
        pass
    try: 
        os.mkdir(f'{directory_path}/data/{repo_num}')
    except:
        pass
    try:
        os.mkdir(f'{full_path}')
    except: 
        print("Path already exists!")

    with open(f'{full_path}/{ms_name}_TEI.xml', 'wb') as xml_file:
        xml_file.write(requests.get(f'http://openn.library.upenn.edu/Data/{repo_num}/{ms_name}/data/{ms_name}_TEI.xml').content)

    with open(f'{full_path}/{ms_name}_TEI.xml') as xml_file:
        soup = BeautifulSoup(xml_file, 'xml')  # specify 'xml' so it parses data correctly

    img_details = soup.find_all('graphic')
    img_names = []
    for img in img_details:
        if '_web.jpg' in img.get('url'):
            img_names.append(img.get('url')[4:])

    #for img in img_names:
    #    with open(f'{full_path}/{img}', 'wb') as image:
    #        image.write(requests.get(f'http://openn.library.upenn.edu/Data/{repo_num}/{ms_name}/data/web/{img}').content)
            
    summary = soup.summary.get_text()
    # Not all manuscripts have each tag, so leave the value as an empty string if the tag cannot be found
    try: 
        extent = soup.extent.get_text()
    except:
        extent = ''
    try:
        scriptNote = soup.scriptNote.get_text()
    except:
        scriptNote = ''
    try:  
        origPlace = soup.origPlace.get_text()
    except: 
        origPlace = ''
    try:
        supportDesc = soup.find('supportDesc')['material']
    except:
        supportDesc = ''
    try:
        origDate_when = soup.find('origDate')['when']
    except:
        origDate_when = ''
        try:
            origDate_start = soup.find('origDate')['notBefore']
            origDate_end = soup.find('origDate')['notAfter']
        except:
            origDate_start = ''
            origDate_end = ''

    keywords = []
    all_keywords = soup.find_all('term')
    for word in all_keywords: 
        keyword = word.get_text()
        keywords.append(keyword)

    owners = []
    all_owners = soup.find('msItem').find_all('persName')
    for persName in all_owners: 
        owner = persName.get_text()
        owners.append(owner)
        
    try:
        authors = []
        all_authors = soup.find_all('author')
        for auth in all_authors: 
            author = auth.get_text()
            authors.append(author)
    except:
        authors = ''
    # Set the value for a row, column label pair as the associated variable 
    collections_contents.at[idx, 'Summary'] = summary
    collections_contents.at[idx, 'Author(s)'] = authors
    collections_contents.at[idx, 'Owner(s)'] = owners
    collections_contents.at[idx, 'Support Desc'] = supportDesc
    collections_contents.at[idx, 'Extent'] = extent
    collections_contents.at[idx, 'Script Note'] = scriptNote
    collections_contents.at[idx, 'OrigDate_Start'] = origDate_start
    collections_contents.at[idx, 'OrigDate_End'] = origDate_end
    collections_contents.at[idx, 'OrigDate_When'] = origDate_when
    collections_contents.at[idx, 'OrigPlace'] = origPlace
    collections_contents.at[idx, 'Keywords'] = keywords
    collections_contents.at[idx, 'ImgNames'] = img_names

0023/lewis_e_018
What didn't work here?
0023/lewis_e_018
What didn't work here?
0023/lewis_e_057
What didn't work here?
0023/lewis_e_083
What didn't work here?
0023/lewis_e_009
What didn't work here?
0023/lewis_e_003
What didn't work here?
0023/lewis_e_005
What didn't work here?
0023/lewis_e_125
What didn't work here?
0023/lewis_e_049
What didn't work here?
0023/lewis_e_126
What didn't work here?
0023/lewis_e_154
What didn't work here?
0023/lewis_e_184
What didn't work here?
0023/lewis_e_195
What didn't work here?
0023/lewis_e_199
What didn't work here?
0023/lewis_e_008
What didn't work here?
0023/lewis_e_014
What didn't work here?
0023/lewis_e_017
What didn't work here?
0023/lewis_e_033
What didn't work here?
0023/lewis_e_039
What didn't work here?
0023/lewis_e_040
What didn't work here?
0023/lewis_e_030
What didn't work here?
0023/lewis_e_002
What didn't work here?
0023/lewis_e_087
What didn't work here?
0023/lewis_e_047
What didn't work here?
0023/lewis_e_050
What didn't work here?


0023/lewis_e_150
What didn't work here?
0023/lewis_e_148
What didn't work here?
0023/lewis_e_175
What didn't work here?
0023/lewis_e_177
What didn't work here?
0023/lewis_e_176
What didn't work here?
0023/lewis_e_138
What didn't work here?
0023/lewis_e_140
What didn't work here?
0023/lewis_e_153
What didn't work here?
0023/lewis_e_159
What didn't work here?
0023/lewis_e_251
What didn't work here?
0023/lewis_e_157
What didn't work here?
0023/lewis_e_161
What didn't work here?
0028/ms_232_011
What didn't work here?
0028/ms_439_016
What didn't work here?
0028/ms_484_011
What didn't work here?
0028/ms_484_020
What didn't work here?
0023/lewis_e_235b
What didn't work here?
0023/lewis_e_250
What didn't work here?
0023/lewis_e_241
What didn't work here?
0023/lewis_e_219_1
What didn't work here?
0023/lewis_e_240
What didn't work here?
0023/lewis_e_186
What didn't work here?
0023/lewis_e_179
What didn't work here?
0028/ms_484_021
What didn't work here?
0027/cpp_10a_231
What didn't work here?
00

0003/BMC_MS51
What didn't work here?
0003/BMC_MS54
What didn't work here?
0023/lewis_e_202
What didn't work here?
0003/BMC_MS41
What didn't work here?
0023/lewis_e_107
What didn't work here?
0023/widener_003
What didn't work here?
0023/widener_007
What didn't work here?
0003/BMC_MS32
What didn't work here?
0003/BMC_MS34
What didn't work here?
0003/BMC_MS05
What didn't work here?
0006/harris_045
What didn't work here?
0006/harris_043
What didn't work here?
0006/harris_047a
What didn't work here?
0006/harris_001
What didn't work here?
0006/harris_047b
What didn't work here?
0031/1945_65_5
What didn't work here?
0031/1945_65_3
What didn't work here?
0031/1883_53
What didn't work here?
0023/widener_009
What didn't work here?
0023/widener_008
What didn't work here?
0023/widener_004
What didn't work here?
0023/widener_001
What didn't work here?
0023/lewis_e_m_001_002
What didn't work here?
0023/lewis_e_m_001_003
What didn't work here?
0023/lewis_e_m_001_004
What didn't work here?
0023/lewis_

0023/lewis_e_m_009_007
What didn't work here?
0023/lewis_e_m_009_008
What didn't work here?
0023/lewis_e_m_009_008a
What didn't work here?
0023/lewis_e_m_009_008b
What didn't work here?
0023/lewis_e_m_009_008c
What didn't work here?
0023/lewis_e_m_009_009
What didn't work here?
0023/lewis_e_m_009_010
What didn't work here?
0023/lewis_e_m_009_011
What didn't work here?
0023/lewis_e_m_009_012
What didn't work here?
0023/lewis_e_m_009_013
What didn't work here?
0023/lewis_e_m_009_014
What didn't work here?
0023/lewis_e_m_009_014a
What didn't work here?
0023/lewis_e_m_009_016a-d
What didn't work here?
0023/lewis_e_m_009_016a
What didn't work here?
0023/lewis_e_m_009_016b
What didn't work here?
0023/lewis_e_m_009_017
What didn't work here?
0023/lewis_e_m_009_017a
What didn't work here?
0023/lewis_e_m_009_018
What didn't work here?
0023/lewis_e_m_009_019
What didn't work here?
0023/lewis_e_m_009_020
What didn't work here?
0023/lewis_e_m_009_021
What didn't work here?
0023/lewis_e_m_009_022
W

0023/lewis_e_m_017_003a
What didn't work here?
0023/lewis_e_m_017_004
What didn't work here?
0023/lewis_e_m_017_005
What didn't work here?
0023/lewis_e_m_017_006
What didn't work here?
0023/lewis_e_m_017_007
What didn't work here?
0023/lewis_e_m_017_008
What didn't work here?
0023/lewis_e_m_017_009
What didn't work here?
0023/lewis_e_m_017_010
What didn't work here?
0023/lewis_e_m_017_011
What didn't work here?
0023/lewis_e_m_017_012
What didn't work here?
0023/lewis_e_m_017_013
What didn't work here?
0023/lewis_e_m_017_014
What didn't work here?
0023/lewis_e_m_017_015
What didn't work here?
0023/lewis_e_m_017_016
What didn't work here?
0023/lewis_e_m_017_017
What didn't work here?
0023/lewis_e_m_017_018
What didn't work here?
0023/lewis_e_m_017_019
What didn't work here?
0023/lewis_e_m_017_020
What didn't work here?
0023/lewis_e_m_017_021
What didn't work here?
0023/lewis_e_m_017_022
What didn't work here?
0023/lewis_e_m_017_023
What didn't work here?
0023/lewis_e_m_018_001-003
What d

What didn't work here?
0023/lewis_e_m_027_002
What didn't work here?
0023/lewis_e_m_027_003
What didn't work here?
0023/lewis_e_m_027_004
What didn't work here?
0023/lewis_e_m_027_005
What didn't work here?
0023/lewis_e_m_027_007
What didn't work here?
0023/lewis_e_m_027_008
What didn't work here?
0023/lewis_e_m_027_009-010
What didn't work here?
0023/lewis_e_m_027_011
What didn't work here?
0023/lewis_e_m_027_012
What didn't work here?
0023/lewis_e_m_027_013
What didn't work here?
0023/lewis_e_m_027_014
What didn't work here?
0023/lewis_e_m_027_015
What didn't work here?
0023/lewis_e_m_027_016
What didn't work here?
0023/lewis_e_m_027_017
What didn't work here?
0023/lewis_e_m_027_018
What didn't work here?
0023/lewis_e_m_027_019
What didn't work here?
0023/lewis_e_m_027_020
What didn't work here?
0023/lewis_e_m_027_021
What didn't work here?
0023/lewis_e_m_027_022-023
What didn't work here?
0023/lewis_e_m_027_024
What didn't work here?
0023/lewis_e_m_027_025
What didn't work here?
002

What didn't work here?
0023/lewis_e_m_042_029
What didn't work here?
0023/lewis_e_m_043_002a
What didn't work here?
0023/lewis_e_m_043_002b
What didn't work here?
0023/lewis_e_m_043_002c
What didn't work here?
0023/lewis_e_m_043_003
What didn't work here?
0023/lewis_e_m_043_004
What didn't work here?
0023/lewis_e_m_043_005
What didn't work here?
0023/lewis_e_m_043_006
What didn't work here?
0023/lewis_e_m_043_007
What didn't work here?
0023/lewis_e_m_043_007a
What didn't work here?
0023/lewis_e_m_043_008
What didn't work here?
0023/lewis_e_m_043_009
What didn't work here?
0023/lewis_e_m_043_009a
What didn't work here?
0023/lewis_e_m_043_009b
What didn't work here?
0023/lewis_e_m_043_010
What didn't work here?
0023/lewis_e_m_043_011
What didn't work here?
0023/lewis_e_m_043_012
What didn't work here?
0023/lewis_e_m_043_013
What didn't work here?
0023/lewis_e_m_043_014
What didn't work here?
0023/lewis_e_m_043_017
What didn't work here?
0023/lewis_e_m_043_018
What didn't work here?
0023/

0023/lewis_e_m_064_022
What didn't work here?
0023/lewis_e_m_064_023
What didn't work here?
0023/lewis_e_m_064_024
What didn't work here?
0023/lewis_e_m_064_025
What didn't work here?
0023/lewis_e_m_064_026
What didn't work here?
0023/lewis_e_m_064_027
What didn't work here?
0023/lewis_e_m_064_028
What didn't work here?
0031/1945_65_11
What didn't work here?
0031/1945_65_14
What didn't work here?
0023/lewis_e_m_065_001
What didn't work here?
0023/lewis_e_m_065_002
What didn't work here?
0023/lewis_e_m_065_003
What didn't work here?
0023/lewis_e_m_065_004
What didn't work here?
0023/lewis_e_m_065_005
What didn't work here?
0023/lewis_e_m_065_006
What didn't work here?
0023/lewis_e_m_065_007
What didn't work here?
0023/lewis_e_m_065_008
What didn't work here?
0023/lewis_e_m_065_009
What didn't work here?
0023/lewis_e_m_065_010
What didn't work here?
0023/lewis_e_m_065_011
What didn't work here?
0023/lewis_e_m_065_012
What didn't work here?
0023/lewis_e_m_065_013
What didn't work here?
00

0023/lewis_e_m_071_010-013
What didn't work here?
0023/lewis_e_m_071_014-015
What didn't work here?
0023/lewis_e_m_072_001
What didn't work here?
0023/lewis_e_m_072_002
What didn't work here?
0023/lewis_e_m_072_003
What didn't work here?
0023/lewis_e_m_072_004-010
What didn't work here?
0023/lewis_e_m_072_011a
What didn't work here?
0023/lewis_e_m_072_011b
What didn't work here?
0023/lewis_e_m_072_012
What didn't work here?
0023/lewis_e_m_072_013-014
What didn't work here?
0023/lewis_e_m_072_015-016
What didn't work here?
0023/lewis_e_m_072_017-021
What didn't work here?
0023/lewis_e_m_072_022
What didn't work here?
0023/lewis_e_m_072_023-024
What didn't work here?
0023/lewis_e_m_073_001-004
What didn't work here?
0023/lewis_e_m_073_005
What didn't work here?
0023/lewis_e_m_073_006
What didn't work here?
0023/lewis_e_m_073_007-009
What didn't work here?
0023/lewis_e_m_073_010-013
What didn't work here?
0023/lewis_e_m_073_014
What didn't work here?
0023/lewis_e_m_073_015-020
What didn't

0023/lewis_t073
What didn't work here?
0023/lewis_t074
What didn't work here?
0023/lewis_t075
What didn't work here?
0023/lewis_t076
What didn't work here?
0023/lewis_t077
What didn't work here?
0023/lewis_t078a
What didn't work here?
0023/lewis_t078b
What didn't work here?
0023/lewis_t079
What didn't work here?
0023/lewis_t080
What didn't work here?
0023/lewis_t081
What didn't work here?
0023/lewis_t082
What didn't work here?
0023/lewis_t083
What didn't work here?
0023/lewis_t084
What didn't work here?
0023/lewis_t086
What didn't work here?
0023/lewis_t087
What didn't work here?
0023/lewis_t088
What didn't work here?
0023/lewis_t090
What didn't work here?
0023/lewis_t091
What didn't work here?
0023/lewis_t092
What didn't work here?
0023/lewis_t093
What didn't work here?
0023/lewis_t094
What didn't work here?
0023/lewis_t095
What didn't work here?
0023/lewis_t096
What didn't work here?
0023/lewis_t097
What didn't work here?
0023/lewis_t098
What didn't work here?
0023/lewis_t099
What di

0023/lewis_t458
What didn't work here?
0023/lewis_t459
What didn't work here?
0023/lewis_t460
What didn't work here?
0023/lewis_t461
What didn't work here?
0023/lewis_t462
What didn't work here?
0023/lewis_t463
What didn't work here?
0023/lewis_t464
What didn't work here?
0023/lewis_t465
What didn't work here?
0023/lewis_t469
What didn't work here?
0023/lewis_t470
What didn't work here?
0023/lewis_t471
What didn't work here?
0023/lewis_t472
What didn't work here?
0023/lewis_t473
What didn't work here?
0023/lewis_t474
What didn't work here?
0023/lewis_t475
What didn't work here?
0023/lewis_t476
What didn't work here?
0023/lewis_t477
What didn't work here?
0023/lewis_t478
What didn't work here?
0023/lewis_t479
What didn't work here?
0023/lewis_t480
What didn't work here?
0023/lewis_t481
What didn't work here?
0023/lewis_t482
What didn't work here?
0023/lewis_t483
What didn't work here?
0023/lewis_t484
What didn't work here?
0023/lewis_t485
What didn't work here?
0023/lewis_t496
What didn

0023/lewis_t537
What didn't work here?
0023/lewis_t538
What didn't work here?
0023/lewis_t539
What didn't work here?
0023/lewis_t540
What didn't work here?
0023/lewis_t541
What didn't work here?
0023/lewis_t542
What didn't work here?
0023/lewis_t543
What didn't work here?
0023/lewis_t544
What didn't work here?
0023/lewis_t545
What didn't work here?
0023/lewis_t546
What didn't work here?
0023/lewis_t547
What didn't work here?
0023/lewis_t548
What didn't work here?
0023/lewis_t549
What didn't work here?
0023/lewis_t550
What didn't work here?
0023/lewis_t551
What didn't work here?
0023/lewis_t552
What didn't work here?
0023/lewis_t553
What didn't work here?
0023/lewis_t554
What didn't work here?
0023/lewis_t555
What didn't work here?
0023/lewis_t556
What didn't work here?
0023/lewis_t557
What didn't work here?
0023/lewis_t558
What didn't work here?
0023/lewis_t559
What didn't work here?
0023/lewis_t560
What didn't work here?
0023/lewis_t561
What didn't work here?
0023/lewis_t562
What didn

0001/ljs447
What didn't work here?
0001/ljs449
What didn't work here?
0001/ljs450
What didn't work here?
0001/ljs451
What didn't work here?
0001/ljs452
What didn't work here?
0001/ljs453
What didn't work here?
0001/ljs455
What didn't work here?
0001/ljs456
What didn't work here?
0001/ljs457
What didn't work here?
0001/ljs458
What didn't work here?
0001/ljs459
What didn't work here?
0001/ljs460
What didn't work here?
0001/ljs462
What didn't work here?
0001/ljs463
What didn't work here?
0001/ljs464
What didn't work here?
0001/ljs465
What didn't work here?
0001/ljs466
What didn't work here?
0001/ljs467
What didn't work here?
0001/ljs468
What didn't work here?
0001/ljs469
What didn't work here?
0001/ljs46
What didn't work here?
0001/ljs470
What didn't work here?
0001/ljs471
What didn't work here?
0001/ljs472
What didn't work here?
0001/ljs473
What didn't work here?
0001/ljs474
What didn't work here?
0001/ljs475
What didn't work here?
0001/ljs476
What didn't work here?
0001/ljs477
What didn

0002/mscodex105
What didn't work here?
0002/mscodex1060
What didn't work here?
0002/mscodex1061
What didn't work here?
0002/mscodex1063
What didn't work here?
0002/mscodex1065
What didn't work here?
0002/mscodex1068
What didn't work here?
0002/mscodex106
What didn't work here?
0002/mscodex1070
What didn't work here?
0002/mscodex1071
What didn't work here?
0002/mscodex1073
What didn't work here?
0002/mscodex1076
What didn't work here?
0002/mscodex1077
What didn't work here?
0002/mscodex1078
What didn't work here?
0002/mscodex1079
What didn't work here?
0002/mscodex107
What didn't work here?
0002/mscodex1080
What didn't work here?
0002/mscodex1081
What didn't work here?
0002/mscodex1082
What didn't work here?
0002/mscodex1083
What didn't work here?
0002/mscodex1084
What didn't work here?
0002/mscodex1085
What didn't work here?
0002/mscodex1086
What didn't work here?
0002/mscodex1087
What didn't work here?
0002/mscodex1088
What didn't work here?
0002/mscodex1089
What didn't work here?
000

0002/mscodex1386
What didn't work here?
0002/mscodex1390
What didn't work here?
0002/mscodex1391
What didn't work here?
0002/mscodex1392
What didn't work here?
0002/mscodex1393
What didn't work here?
0002/mscodex1395
What didn't work here?
0002/mscodex1399
What didn't work here?
0002/mscodex1401
What didn't work here?
0002/mscodex1402
What didn't work here?
0002/mscodex1403
What didn't work here?
0002/mscodex1404
What didn't work here?
0002/mscodex1405
What didn't work here?
0002/mscodex1410
What didn't work here?
0002/mscodex1411
What didn't work here?
0002/mscodex1413
What didn't work here?
0002/mscodex1418
What didn't work here?
0002/mscodex141
What didn't work here?
0002/mscodex1420
What didn't work here?
0002/mscodex1422
What didn't work here?
0002/mscodex1423
What didn't work here?
0002/mscodex1424
What didn't work here?
0002/mscodex1425
What didn't work here?
0002/mscodex1427
What didn't work here?
0002/mscodex1428
What didn't work here?
0002/mscodex1429
What didn't work here?
0

0002/mscodex469
What didn't work here?
0002/mscodex48
What didn't work here?
0002/mscodex492
What didn't work here?
0002/mscodex498
What didn't work here?
0002/mscodex50
What didn't work here?
0002/mscodex52
What didn't work here?
0002/mscodex53
What didn't work here?
0002/mscodex543
What didn't work here?
0002/mscodex54
What didn't work here?
0002/mscodex555
What didn't work here?
0002/mscodex55
What didn't work here?
0002/mscodex562
What didn't work here?
0002/mscodex564
What didn't work here?
0002/mscodex568
What didn't work here?
0002/mscodex56
What didn't work here?
0002/mscodex57
What didn't work here?
0002/mscodex581
What didn't work here?
0002/mscodex58
What didn't work here?
0002/mscodex597
What didn't work here?
0002/mscodex59
What didn't work here?
0002/mscodex60
What didn't work here?
0002/mscodex611
What didn't work here?
0002/mscodex614
What didn't work here?
0002/mscodex615
What didn't work here?
0002/mscodex61
What didn't work here?
0002/mscodex620
What didn't work here

0002/mscodex98
What didn't work here?
0002/mscodex991
What didn't work here?
0002/mscodex99
What didn't work here?
0002/mscodex9
What didn't work here?
0002/mscoll105
What didn't work here?
0002/mscoll106
What didn't work here?
0002/mscoll196
What didn't work here?
0002/mscoll197
What didn't work here?
0002/mscoll270
What didn't work here?
0002/mscoll49_f72
What didn't work here?
0002/mscoll49_f81
What didn't work here?
0002/mscoll49_f84
What didn't work here?
0002/mscoll591_f10
What didn't work here?
0002/mscoll591_f11
What didn't work here?
0002/mscoll591_f12
What didn't work here?
0002/mscoll591_f13
What didn't work here?
0002/mscoll591_f14
What didn't work here?
0002/mscoll591_f16
What didn't work here?
0002/mscoll591_f17
What didn't work here?
0002/mscoll591_f18
What didn't work here?
0002/mscoll591_f19
What didn't work here?
0002/mscoll591_f1
What didn't work here?
0002/mscoll591_f20
What didn't work here?
0002/mscoll591_f21
What didn't work here?
0002/mscoll591_f22
What didn't w

0002/mscoll764_item121
What didn't work here?
0002/mscoll764_item122
What didn't work here?
0002/mscoll764_item123
What didn't work here?
0002/mscoll764_item124
What didn't work here?
0002/mscoll764_item125
What didn't work here?
0002/mscoll764_item126
What didn't work here?
0002/mscoll764_item127
What didn't work here?
0002/mscoll764_item128
What didn't work here?
0002/mscoll764_item129
What didn't work here?
0002/mscoll764_item130
What didn't work here?
0002/mscoll764_item132
What didn't work here?
0002/mscoll764_item133
What didn't work here?
0002/mscoll764_item134
What didn't work here?
0002/mscoll764_item135
What didn't work here?
0002/mscoll764_item136
What didn't work here?
0002/mscoll764_item137
What didn't work here?
0002/mscoll764_item138
What didn't work here?
0002/mscoll764_item140
What didn't work here?
0002/mscoll764_item144
What didn't work here?
0002/mscoll764_item146
What didn't work here?
0002/mscoll764_item147
What didn't work here?
0002/mscoll764_item148
What didn't

In [22]:
# Return the first five rows
collections_contents.head(5)

Now that we've augmented the original BiblioPhilly collection data, let's save all the data as a new CSV file. 

In [None]:

# Write the dataframe to a comma-separated values (csv) files
collections_contents.to_csv("data/collections_contents_w_metadata.csv", index=False)

# Need Help?
<div class="alert alert-block alert-warning">
    <p>For additional Python and Digital Scholarship resources:</p>
    <ul>
        <li><a href"https://www.w3schools.com/python/pandas/default.asp">Pandas Tutorial from W3 Schools</a></li>
        <li><a href="https://guides.library.upenn.edu/digital-scholarship">Center for Research Data and Digital Scholarship</a></li>
    </ul>
    <p>For help with this notebook:</p>    
<ul>
    <li>If you encounter any errors in this notebook, you can open an issue on GitHub or email estene@upenn.edu and reference this notebook.</li>

<li>If you encounter any errors while working with the BiblioPhilly metadata, you can email dorp@upenn.edu.</li>

<li>If you encounter issues with accessing data from OPenn, visit
    <a href="https://openn.library.upenn.edu/TechnicalReadMe.html">OPenn</a></li>
    </ul>
</div>

----

# Credits

Created by [Emily Esten](https://www.library.upenn.edu/people/staff/emily-esten) and [Dot Porter](https://www.library.upenn.edu/people/staff/dot-porter). 

Judaica Digital Humanities at the <a href="http://library.upenn.edu">Penn Libraries</a> (also referred to as Judaica DH) is a robust program of projects and tools for experimental digital scholarship with Judaica collections, informed by digital humanities, Jewish studies, and cultural heritage approaches. Visit our [website](judaicadh.library.upenn.edu).

The dataset for this notebook works with items from the **Bibliotheca Philadelphiensis** project. Members of the [Philadelphia Area Consortium of Special Collections Libraries (PACSCL)](http://pacscl.org/) catalogued and digitized medieval Western European manuscripts with the generous support of the [Council on Library and Information Resources (CLIR)](https://www.clir.org/), via its Digitizing Hidden Special Collections and Archives initiative. All images have been released into the public domain. More information about the collection can be found at [https://bibliophilly.library.upenn.edu/](https://bibliophilly.library.upenn.edu/). 

This notebook references existing code and Jupyter notebooks, including: 
* [GLAM Workbench for the National Museum of Australia](https://doi.org/10.5281/zenodo.3544747) sponsored by the [Humanities, Arts and Social Sciences (HASS) Data Enhanced Virtual Lab](https://tinker.edu.au/).
* [Library of Congress Data Exploration: IIIF](https://github.com/LibraryOfCongress/data-exploration/blob/26510c3f4da0bc85dfa87e82141173b1830e9d64/IIIF.ipynb).
* Gustavo Candela, María Dolores Sáez, Pilar Escobar, Manuel Marco-Such, & Rafael C.Carrasco. (2020, May 8). hibernator11/notebook-iiif-images: release1.1 (Version 1.1). Zenodo. [http://doi.org/10.5281/zenodo.3816611](https://zenodo.org/badge/latestdoi/255172461). 
* [Genes for Project Cognoma](https://github.com/cognoma/genes/blob/721204091a96e55de6dcad165d6d8265e67e2a48/2.process.py)
* https://mindtrove.info/jupyter-tidbit-image-gallery/