Skip to content

Tools for bulk upload of Thoth works to Internet Archive using thoth-dissemination

License

Notifications You must be signed in to change notification settings

thoth-pub/iabulkupload

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

iabulkupload

The tools in this repository can be used for bulk upload of Thoth publishers' back catalogues to Internet Archive via the Thoth Dissemination Service.

This README records the steps taken to upload the OBP and punctum back catalogues to the Thoth Archiving Network collection on 2022-11-28/29.

At the time of this upload, the tools were contained in a subfolder iabulkupload of the Thoth Dissemination Service repository thoth-dissemination. The wording of the steps reflects this.

See also the README for the Thoth Dissemination Service itself.

Steps to upload

  1. Check out clean version of Thoth Dissemination Service v0.1.0 to parent folder thoth-dissemination.
  2. Ensure that the appropriate Internet Archive credentials are present in ../config.env.
  3. In parent folder thoth-dissemination, build Thoth Dissemination Service v0.1.0 docker image with name testdissem by running
docker build . -t testdissem
  1. Ensure that the desired publisher Thoth IDs (and short names) are present in ./obtain_work_ids.py.
  2. Create lists of Thoth IDs of works to be uploaded by running
./obtain_work_ids.py
  1. For each list, start the upload process by running
./bulkupload.sh [publisher]_list.txt 2>> disseminator.log
  1. Check ./disseminator.log for any ERROR messages. If necessary, cancel the upload process using ctrl+C. Once errors are resolved, the upload process can be re-started (successfully uploaded work IDs will be skipped).
  2. Once upload process completes, check that all work IDs present in the ./[publisher]_list.txt files also appear in ./uploaded.txt.

Alternative credentials handling

Instead of filling out ../config.env in step 2, credentials can be set as environment variables if some changes are made to ./bulkupload.sh. In place of line 30 (docker run --rm testdissem ./disseminator.py --work $work_id --platform InternetArchive), do either of the following:

  • pass the credentials directly to the docker container as environment variables:
docker run --env ia_s3_secret=[xxx] --env ia_s3_access=[yyy] --rm testdissem ./disseminator.py --work $work_id --platform InternetArchive
  • use the undockerised run method given in the comment in line 33, having set the credentials as environment variables in the shell (export ia_s3_secret=[xxx]; export ia_s3_access=[yyy]):
python3 ../disseminator.py --work $work_id --platform InternetArchive

About

Tools for bulk upload of Thoth works to Internet Archive using thoth-dissemination

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published