Simple tools to make the most of collection control functionality in ArchivesSpace
- Standalone SQL scripts to:
- analyze collection control data before update
- prepare update spreadsheets
- report on data after update
- Standalone Python scripts to analyze EAD (Coming Soon)
- Data cleaning tools (In Progress)
- Standalone Python scripts to make bulk updates to collection control data
- Simple GUIs which package standalone scripts into easy-to-use interfaces
- Spreadsheet templates to record collection control data for upload
- FAQ - Logistics, APIs, SQL, etc. (In Progress)
- Screencast tutorials (In Progress)
- Suggestions for further study
- Test all scripts thoroughly before running in production
- ...How to install a TEST instance of AS if you don't already have one...
- Use SQL queries to analyze and report on ArchivesSpace collection control data; don't make changes to the database via SQL unless you know what you're doing
- Instead, use or modify the included Python scripts to update collection control data quickly and in bulk via the ArchivesSpace API
- Analyze your data to identify remediation needs
- Generate data for cleaning with OpenRefine or other tools
- Use data returned from queries to make updates via the ArchivesSpace API
- Run reports on holdings once collection control data is added to ArchivesSpace
- Identify issues with current collection control data in ArchivesSpace and create reports once these issues have been addressed
- ArchivesSpace 1.5+ (NOT TESTED ON AS 2.0+)
- Access to ArchivesSpace database, login credentials (host name, database name, port, username, password)
- Administrator access to your computer, likely
- LibreOffice: https://www.libreoffice.org - free and open source; works particularly well for CSVs; Excel tends to mess with barcodes, so avoid if possible, especially when making changes to containers
- Standalone scripts: SQL Client
- Software Recommendations:
- MySQL Workbench - https://dev.mysql.com/downloads/workbench/
- HeidiSQL - https://www.heidisql.com/download.php
- Software Recommendations:
- GUI: Python 3.4+,
pymysql
module- Software Recommendation:
- Anaconda - https://www.continuum.io/downloads. Anaconda is a free, open source Python distribution which comes with a number of useful modules for data analysis and manipulation. The
requests
,pandas
, 'lxml' andpymysql
modules are among hundreds of Python add-ons which can easily be installed via the Anaconda Navigator interface. See https://docs.continuum.io/anaconda/ for full documentation and installation instructions.
- Anaconda - https://www.continuum.io/downloads. Anaconda is a free, open source Python distribution which comes with a number of useful modules for data analysis and manipulation. The
- Software Recommendation:
- Reporting Scripts:
- Python 3.4+,
pandas
module (http://pandas.pydata.org)
- Python 3.4+,
- Gets a list of container profiles with titles and URIs
- Container profile URIs can be used to add or update top containers via the AS API
- Gets a list of locations with titles and URIs
- Location URIs can be used to add or update top containers via the AS API
- Retrieve a list of archival objects for a given collection, with parent-child relationships indicated
- Archival Object URIs can be used when attaching top container instances (see get_top_containers.sql) to archival objects via the AS API
- Gets a list of existing top containers, with location and container profile data
- Top container URIs can be used in conjunction with Archival Object URIs (see get_archival_objects.sql) to create top container instances via the AS API
- Retrieve a list of archival object instances (a container list, essentially) for a given collection
- Retrieves a list of access restrictions at the archival object level
- Retrieves a list of access restrictions at the resource level
- Retrieves a list of begin and end dates for access restrictions
- Retrieves a list of machine-actionable access restrictions
- Get all restrictions that end on a user-defined date
- Using a list of EAD IDs as input, retrieve access notes about a group of resources
- Using a list of barcodes as input, retrieve information about attached archival objects
- Packages all of the above scripts into an easy-to use GUI. Must have login credentials to run queries. Must also know your repository's assigned number in the ArchivesSpace database, and, for some scripts, the EAD ID of the collection you want to analyze (note: EAD ID could be changed to identifier)
- To run GUI from Anaconda, open Anaconda Navigator, click on Environments tab, select ...etc.
Clean up messy data retrieved from queries - upload this data to ArchivesSpace via API...
- OpenRefine: http://openrefine.org/ - free and open source data cleaning tool...
- LibreOffice: https://www.libreoffice.org - free and open source; works particularly well for CSVs; Excel tends to mess with barcodes, so avoid if possible, especially when making changes to containers
- Python 3.4+: https://www.python.org/downloads/
- Software Recommendation:
- Anaconda - https://www.continuum.io/downloads. Anaconda is a free, open source Python distribution which comes with a number of useful modules for data analysis and manipulation. The
requests
,pandas
, 'lxml' andpymysql
modules are among hundreds of Python add-ons which can easily be installed via the Anaconda Navigator interface. See https://docs.continuum.io/anaconda/ for full documentation and installation instructions.
- Anaconda - https://www.continuum.io/downloads. Anaconda is a free, open source Python distribution which comes with a number of useful modules for data analysis and manipulation. The
- Software Recommendation:
- Python
pandas
module (included with Anaconda installation; see further reading section for instructions on how to install third-party modules in your main Python installation)
- Possible Uses:
- Break out locations data (ranges, shelf numbers, etc.) that was combined into a single field during ASpace import
- Normalize box and folder numbering
- Cluster terms for containers to create a definitive container profile list from existing data
Quickly add collection control data to ArchivesSpace using spreadsheets and the ArchivesSpace API
- ArchivesSpace version 1.5+ (NOT TESTED ON AS 2.0+)
- Access to ArchivesSpace API
- Python 3.4+: https://www.python.org/downloads/
- Software Recommendation:
- Anaconda - https://www.continuum.io/downloads. Anaconda is a free, open source Python distribution which comes with a number of useful modules for data analysis and manipulation. The
requests
,pandas
, 'lxml' andpymysql
modules are among hundreds of Python add-ons which can easily be installed via the Anaconda Navigator interface. See https://docs.continuum.io/anaconda/ for full documentation and installation instructions.
- Anaconda - https://www.continuum.io/downloads. Anaconda is a free, open source Python distribution which comes with a number of useful modules for data analysis and manipulation. The
- Software Recommendation:
- Python
requests
module (included with Anaconda installation; see further reading section for instructions on how to install third-party modules in your main Python installation) - LibreOffice: https://www.libreoffice.org - free and open source; works particularly well for CSVs; Excel tends to mess with barcodes, so avoid if possible, especially when making changes to containers
Add container profiles to ArchivesSpace
- Use this spreadsheet to enter your container profile data
- This script takes the data from your container_profile_template spreadsheet and posts to ArchivesSpace
Add locations data to ArchivesSpace
- Use this spreadsheet to enter your location data
- This script takes the data from your completed locations_template spreadsheet and posts to ArchivesSpace
- Use this spreadsheet to enter your location profile data
- This script takes the data from your completed location_profiles_template spreadsheet and posts to ArchivesSpace
- Use this spreadsheet to enter your top container data
- Suggestion: if possible, work collection-by-collection to upload container data, and associate the containers for each collection with their archival objects before moving on to the next collection.
- This script takes the data from your completed top_container_template spreadsheet and posts to ArchivesSpace
- Use this spreadsheet to enter your top container instance data
- This script takes the data from your completed tc_instance_template spreadsheet and posts to ArchivesSpace
Add machine-actionable restrictions to ArchivesSpace
- Use this spreadsheet to enter your restriction data, at either the resource or archival object levels
- This script takes the data from your completed restrictions_template spreadsheet and posts to ArchivesSpace
- This demo is for Windows 10
- Installing third-party Python modules: https://python4astronomers.github.io/installation/packages.html https://docs.python.org/3/installing/
- Python 3 Syntax: https://docs.python.org/3/tutorial/
- SQL Syntax: https://dev.mysql.com/doc/refman/5.7/en/tutorial.html
- Great archives-specific intro to Python - https://practicaltechnologyforarchives.org/issue7_wiedeman/
- ArchivesSpace API reference: http://archivesspace.github.io/archivesspace/api/
- Machine-Actionable restriction specification: http://bit.ly/2uhHVlO
- Yale Libguide: http://guides.library.yale.edu/archivesspace/ASpaceContainerManagement
- Manuals and Training Resources: https://sites.google.com/site/archivesspacetraining/archivesspace-manuals--training-resources
- NYU ArchivesSpace Manual: http://bit.ly/2tmGNvL
- ArchivesSpace 1.5 Webinar: http://archivesspace.org/recording-and-slides-for-v1-5-0-release-webinar/
- ArchivesSpace Developer Screencasts: https://www.youtube.com/playlist?list=PLJFitFaE9AY_DDlhl3Kq_vFeX27F1yt6I