# Download ORACC JSON Files
This script downloads open data from the Open Richly Annotated Cuneiform Corpus ([ORACC](http://oracc.org)) in `json` format. The JSON files are made available in a ZIP file. For a description of the various JSON files included in the ZIP see the [open data](http://oracc.museum.upenn.edu/doc/opendata) page on [ORACC](http://oracc.org). 

In [1]:
import pandas as pd   
import requests
import io
import tqdm
import json
import os
import zipfile
import errno
import time

# Create Download Directory
Create a directory called `jsonzip`. If the directory already exists, do nothing.

For the code, see [Stack Overflow](http://stackoverflow.com/questions/18973418/os-mkdirpath-returns-oserror-when-directory-does-not-exist).

In [2]:
ROOT_PATH = os.getcwd()

PROJECTS_METADATA_PATH = os.path.join('projectsmetadata')

CSV_PROJECTS_DF = os.path.join(PROJECTS_METADATA_PATH, 'projects.csv')
LIST_OF_PROJECTS = os.path.join(PROJECTS_METADATA_PATH, 'projects.txt')

ZIP_PATH = os.path.join(os.getcwd(), 'jsonzip')
EXTRACT_PATH = os.path.join(os.getcwd(), 'projectsdata')

In [3]:
try:
    os.mkdir(ZIP_PATH)
except OSError as exc:
    if exc.errno !=errno.EEXIST:
        raise
    pass

try:
    os.mkdir(EXTRACT_PATH)
except OSError as exc:
    if exc.errno !=errno.EEXIST:
        raise
    pass

try:
    os.mkdir(PROJECTS_METADATA_PATH)
except OSError as exc:
    if exc.errno !=errno.EEXIST:
        raise
    pass

# Get up-to-date list of existing projects and subprojects

As listed in [The Oracc Project List](https://oracc.museum.upenn.edu/projectlist.html)

In [4]:
projects_url = 'https://oracc.museum.upenn.edu/projectlist.html'
response = requests.get(projects_url, verify=False)

response_data = response.text

lines_in_html = response_data.split('\n')

projects_dict = {}
run_shortcuts = []

idx=0
for line in lines_in_html:
    if 'href="./' in line:
        line_parts = line.split('href="./')
        line_parts_2 = line_parts[1].split('">')
        project_shortcut = line_parts_2[0]
        project_shortcut = project_shortcut.replace('/', '-')
        if project_shortcut in run_shortcuts:
            continue
        else:
            line_parts_3 = line_parts_2[1].split('</a>')
            project_name = line_parts_3[0]
            projects_dict[idx] = {'name': project_name, 'shortcut': project_shortcut, 'project_json_link': f'https://oracc.museum.upenn.edu/json/{project_shortcut}.zip'}
            run_shortcuts.append(project_shortcut)
            idx += 1
            
print('Up-to-date list of projects has been created.')

# Extracting projects to csv file projects.csv
projects_df = pd.DataFrame.from_dict(projects_dict)
projects_df = projects_df.transpose()
projects_df.to_csv(CSV_PROJECTS_DF)
print(f'File projects.csv has been saved to {CSV_PROJECTS_DF}.')

# Extracting list of projects to txt file projects.txt
with open(LIST_OF_PROJECTS, 'w', encoding='utf-8') as txt_file:
    txt_file.write('\n'.join(run_shortcuts))
print(f'File projects.txt has been saved to {LIST_OF_PROJECTS}.')



Up-to-date list of projects has been created.
File projects.csv has been saved to projectsmetadata\projects.csv.
File projects.txt has been saved to projectsmetadata\projects.txt.


# Parse the file with text IDs
The following code reads the file with text ID and pulls out the project names. The code removes accidental spaces at the beginning and the end of each line as well as blank lines. Each line in the file with text IDs is split at the first space - everything after the first space is ignored. The last 8 digits of the text ID are removed, to leave only the project name.

#FV note: I want only projects, not texts --> iggnoring the removal of last 8 characters.

In [5]:
with open(LIST_OF_PROJECTS, 'r') as f:
    projects = f.read().split('\n')

In [6]:
for project in projects:
    print(project)

adsd
adsd-adart1
adsd-adart2
adsd-adart3
adsd-adart5
adsd-adart6
aemw
aemw-alalakh-idrimi
aemw-amarna
akklove
amgg
ario
armep
arrim
asbp
asbp-ninmed
asbp-rlasb
atae
atae-assur
atae-burmarina
atae-durkatlimmu
atae-durszarrukin
atae-guzana
atae-huzirina
atae-imgurenlil
atae-kalhu
atae-kunalia
atae-mallanate
atae-marqasu
atae-nineveh
atae-samal
atae-szibaniba
atae-tilbarsip
atae-tuszhan
babcity
balt
blms
borsippa
btmao
btto
cams-akno
cams-barutu
cams-gkab
cams-ludlul
cams-selbi
cams-tlab
cdli
ckst
cmawro
cmawro-cmawr1
cmawro-cmawr2
cmawro-cmawr3
cmawro-maqlu
contrib-amarna
contrib-jacobsen
contrib-lambert
ctij
dcclt
dcclt-ebla
dcclt-jena
dcclt-nineveh
dcclt-signlists
dccmt
dsst
ecut
edlex
eisl
epsd2
etcsri
glass
hbtin
iraq
iraq-iraq85
kish
kish-fieldmus
kish-fieldmus-fmod
kish-mathaffield
nere
nimrud
obel
obmc
obta
ogsl
oimea
osl
pnao
qcat
riao
ribo
ribo-bab7scores
ribo-babylon10
ribo-babylon2
ribo-babylon3
ribo-babylon4
ribo-babylon5
ribo-babylon6
ribo-babylon7
ribo-babylon8
ribo-sources

## Download the ZIP files
For each project from which files are to be processed download the entire project (all the json files) from `http://build-oracc.museum.upenn.edu/json/`. The file is called `PROJECT.zip` (for instance: `dcclt.zip`). For subprojects the file is called `PROJECT-SUBPROJECT.zip` (for instance `cams-gkab.zip`). 

For larger projects (such as [DCCLT](http://oracc.org/dcclt)) the `zip` file may be 25Mb or more. Downloading may take some time and it may be necessary to chunk the downloading process. The `iter_content()` function in the `requests` library takes care of that.

Although downloading the entire zip file is time consuming, it will make processing the individual files much more efficient and the code is less likely to break due to interruption in connectivity.


In [7]:
CHUNK = 16 * 1024
for project in tqdm.tqdm(projects):
    url = "http://build-oracc.museum.upenn.edu/json/" + project + ".zip"
    file = 'jsonzip/' + project + '.zip'
    r = requests.get(url, verify=False)
    if r.status_code == 200:
        print("Downloading " + url + " saving as " + file)
        with open(file, 'wb') as f:
            for c in r.iter_content(chunk_size=CHUNK):
                f.write(c)
        print('Waiting 3 seconds in order not to overload the ORACC server')
        time.sleep(3)
    else:
        print(url + " does not exist.")



Downloading http://build-oracc.museum.upenn.edu/json/adsd.zip saving as jsonzip/adsd.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/adsd-adart1.zip saving as jsonzip/adsd-adart1.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/adsd-adart2.zip saving as jsonzip/adsd-adart2.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/adsd-adart3.zip saving as jsonzip/adsd-adart3.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/adsd-adart5.zip saving as jsonzip/adsd-adart5.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/adsd-adart6.zip saving as jsonzip/adsd-adart6.zip
Waiting 3 seconds in order not to overload the ORACC server


  5%|▌         | 7/139 [00:28<07:09,  3.26s/it]

http://build-oracc.museum.upenn.edu/json/aemw.zip does not exist.




Downloading http://build-oracc.museum.upenn.edu/json/aemw-alalakh-idrimi.zip saving as jsonzip/aemw-alalakh-idrimi.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/aemw-amarna.zip saving as jsonzip/aemw-amarna.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/akklove.zip saving as jsonzip/akklove.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/amgg.zip saving as jsonzip/amgg.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/ario.zip saving as jsonzip/ario.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/armep.zip saving as jsonzip/armep.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/arrim.zip saving as jsonzip/arrim.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/asbp.zip saving as jsonzip/asbp.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/asbp-ninmed.zip saving as jsonzip/asbp-ninmed.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/asbp-rlasb.zip saving as jsonzip/asbp-rlasb.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/atae.zip saving as jsonzip/atae.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/atae-assur.zip saving as jsonzip/atae-assur.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/atae-burmarina.zip saving as jsonzip/atae-burmarina.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/atae-durkatlimmu.zip saving as jsonzip/atae-durkatlimmu.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/atae-durszarrukin.zip saving as jsonzip/atae-durszarrukin.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/atae-guzana.zip saving as jsonzip/atae-guzana.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/atae-huzirina.zip saving as jsonzip/atae-huzirina.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/atae-imgurenlil.zip saving as jsonzip/atae-imgurenlil.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/atae-kalhu.zip saving as jsonzip/atae-kalhu.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/atae-kunalia.zip saving as jsonzip/atae-kunalia.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/atae-mallanate.zip saving as jsonzip/atae-mallanate.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/atae-marqasu.zip saving as jsonzip/atae-marqasu.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/atae-nineveh.zip saving as jsonzip/atae-nineveh.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/atae-samal.zip saving as jsonzip/atae-samal.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/atae-szibaniba.zip saving as jsonzip/atae-szibaniba.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/atae-tilbarsip.zip saving as jsonzip/atae-tilbarsip.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/atae-tuszhan.zip saving as jsonzip/atae-tuszhan.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/babcity.zip saving as jsonzip/babcity.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/balt.zip saving as jsonzip/balt.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/blms.zip saving as jsonzip/blms.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/borsippa.zip saving as jsonzip/borsippa.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/btmao.zip saving as jsonzip/btmao.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/btto.zip saving as jsonzip/btto.zip
Waiting 3 seconds in order not to overload the ORACC server


 29%|██▉       | 41/139 [03:20<06:09,  3.77s/it]

http://build-oracc.museum.upenn.edu/json/cams-akno.zip does not exist.




Downloading http://build-oracc.museum.upenn.edu/json/cams-barutu.zip saving as jsonzip/cams-barutu.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/cams-gkab.zip saving as jsonzip/cams-gkab.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/cams-ludlul.zip saving as jsonzip/cams-ludlul.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/cams-selbi.zip saving as jsonzip/cams-selbi.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/cams-tlab.zip saving as jsonzip/cams-tlab.zip
Waiting 3 seconds in order not to overload the ORACC server


 34%|███▍      | 47/139 [03:42<04:50,  3.16s/it]

http://build-oracc.museum.upenn.edu/json/cdli.zip does not exist.




Downloading http://build-oracc.museum.upenn.edu/json/ckst.zip saving as jsonzip/ckst.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/cmawro.zip saving as jsonzip/cmawro.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/cmawro-cmawr1.zip saving as jsonzip/cmawro-cmawr1.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/cmawro-cmawr2.zip saving as jsonzip/cmawro-cmawr2.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/cmawro-cmawr3.zip saving as jsonzip/cmawro-cmawr3.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/cmawro-maqlu.zip saving as jsonzip/cmawro-maqlu.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/contrib-amarna.zip saving as jsonzip/contrib-amarna.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/contrib-jacobsen.zip saving as jsonzip/contrib-jacobsen.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/contrib-lambert.zip saving as jsonzip/contrib-lambert.zip
Waiting 3 seconds in order not to overload the ORACC server


 41%|████      | 57/139 [04:21<04:01,  2.95s/it]

http://build-oracc.museum.upenn.edu/json/ctij.zip does not exist.




Downloading http://build-oracc.museum.upenn.edu/json/dcclt.zip saving as jsonzip/dcclt.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/dcclt-ebla.zip saving as jsonzip/dcclt-ebla.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/dcclt-jena.zip saving as jsonzip/dcclt-jena.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/dcclt-nineveh.zip saving as jsonzip/dcclt-nineveh.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/dcclt-signlists.zip saving as jsonzip/dcclt-signlists.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/dccmt.zip saving as jsonzip/dccmt.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/dsst.zip saving as jsonzip/dsst.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/ecut.zip saving as jsonzip/ecut.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/edlex.zip saving as jsonzip/edlex.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/eisl.zip saving as jsonzip/eisl.zip
Waiting 3 seconds in order not to overload the ORACC server


 49%|████▉     | 68/139 [05:11<04:11,  3.54s/it]

http://build-oracc.museum.upenn.edu/json/epsd2.zip does not exist.




Downloading http://build-oracc.museum.upenn.edu/json/etcsri.zip saving as jsonzip/etcsri.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/glass.zip saving as jsonzip/glass.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/hbtin.zip saving as jsonzip/hbtin.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/iraq.zip saving as jsonzip/iraq.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/iraq-iraq85.zip saving as jsonzip/iraq-iraq85.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/kish.zip saving as jsonzip/kish.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/kish-fieldmus.zip saving as jsonzip/kish-fieldmus.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/kish-fieldmus-fmod.zip saving as jsonzip/kish-fieldmus-fmod.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/kish-mathaffield.zip saving as jsonzip/kish-mathaffield.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/nere.zip saving as jsonzip/nere.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/nimrud.zip saving as jsonzip/nimrud.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/obel.zip saving as jsonzip/obel.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/obmc.zip saving as jsonzip/obmc.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/obta.zip saving as jsonzip/obta.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/ogsl.zip saving as jsonzip/ogsl.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/oimea.zip saving as jsonzip/oimea.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/osl.zip saving as jsonzip/osl.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/pnao.zip saving as jsonzip/pnao.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/qcat.zip saving as jsonzip/qcat.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/riao.zip saving as jsonzip/riao.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/ribo.zip saving as jsonzip/ribo.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/ribo-bab7scores.zip saving as jsonzip/ribo-bab7scores.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/ribo-babylon10.zip saving as jsonzip/ribo-babylon10.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/ribo-babylon2.zip saving as jsonzip/ribo-babylon2.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/ribo-babylon3.zip saving as jsonzip/ribo-babylon3.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/ribo-babylon4.zip saving as jsonzip/ribo-babylon4.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/ribo-babylon5.zip saving as jsonzip/ribo-babylon5.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/ribo-babylon6.zip saving as jsonzip/ribo-babylon6.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/ribo-babylon7.zip saving as jsonzip/ribo-babylon7.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/ribo-babylon8.zip saving as jsonzip/ribo-babylon8.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/ribo-sources.zip saving as jsonzip/ribo-sources.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/rimanum.zip saving as jsonzip/rimanum.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/rinap.zip saving as jsonzip/rinap.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/rinap-rinap1.zip saving as jsonzip/rinap-rinap1.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/rinap-rinap2.zip saving as jsonzip/rinap-rinap2.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/rinap-rinap3.zip saving as jsonzip/rinap-rinap3.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/rinap-rinap4.zip saving as jsonzip/rinap-rinap4.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/rinap-rinap5.zip saving as jsonzip/rinap-rinap5.zip
Waiting 3 seconds in order not to overload the ORACC server


 77%|███████▋  | 107/139 [08:01<01:51,  3.49s/it]

http://build-oracc.museum.upenn.edu/json/rinap-rinap5p1.zip does not exist.




Downloading http://build-oracc.museum.upenn.edu/json/rinap-scores.zip saving as jsonzip/rinap-scores.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/rinap-sources.zip saving as jsonzip/rinap-sources.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/saao.zip saving as jsonzip/saao.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/saao-aebp.zip saving as jsonzip/saao-aebp.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/saao-knpp.zip saving as jsonzip/saao-knpp.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/saao-saa01.zip saving as jsonzip/saao-saa01.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/saao-saa02.zip saving as jsonzip/saao-saa02.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/saao-saa03.zip saving as jsonzip/saao-saa03.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/saao-saa04.zip saving as jsonzip/saao-saa04.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/saao-saa05.zip saving as jsonzip/saao-saa05.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/saao-saa06.zip saving as jsonzip/saao-saa06.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/saao-saa07.zip saving as jsonzip/saao-saa07.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/saao-saa08.zip saving as jsonzip/saao-saa08.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/saao-saa09.zip saving as jsonzip/saao-saa09.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/saao-saa10.zip saving as jsonzip/saao-saa10.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/saao-saa11.zip saving as jsonzip/saao-saa11.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/saao-saa12.zip saving as jsonzip/saao-saa12.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/saao-saa13.zip saving as jsonzip/saao-saa13.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/saao-saa14.zip saving as jsonzip/saao-saa14.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/saao-saa15.zip saving as jsonzip/saao-saa15.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/saao-saa16.zip saving as jsonzip/saao-saa16.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/saao-saa17.zip saving as jsonzip/saao-saa17.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/saao-saa18.zip saving as jsonzip/saao-saa18.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/saao-saa19.zip saving as jsonzip/saao-saa19.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/saao-saa20.zip saving as jsonzip/saao-saa20.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/saao-saa21.zip saving as jsonzip/saao-saa21.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/saao-saas2.zip saving as jsonzip/saao-saas2.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/suhu.zip saving as jsonzip/suhu.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/tcma.zip saving as jsonzip/tcma.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/tsae.zip saving as jsonzip/tsae.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/urap.zip saving as jsonzip/urap.zip
Waiting 3 seconds in order not to overload the ORACC server




Downloading http://build-oracc.museum.upenn.edu/json/xcat.zip saving as jsonzip/xcat.zip
Waiting 3 seconds in order not to overload the ORACC server


100%|██████████| 139/139 [10:41<00:00,  4.61s/it]


# Extracting ZIP files

TODO: add function description

In [8]:
def extract_and_delete_zip():
    zipped_projects = os.listdir(ZIP_PATH)
    for z_file in zipped_projects:
        if z_file[-4:] == '.zip':
            with zipfile.ZipFile(os.path.join(ZIP_PATH, z_file), 'r') as zip_ref:
                zip_ref.extractall(EXTRACT_PATH)
    
            os.remove(os.path.join(ZIP_PATH, z_file))
    
            print(f"File {z_file} has been extracted to folder projectsdata and deleted.")

In [9]:
extract_and_delete_zip()

File adsd-adart1.zip has been extracted to folder projectsdata and deleted.
File adsd-adart2.zip has been extracted to folder projectsdata and deleted.
File adsd-adart3.zip has been extracted to folder projectsdata and deleted.
File adsd-adart5.zip has been extracted to folder projectsdata and deleted.
File adsd-adart6.zip has been extracted to folder projectsdata and deleted.
File adsd.zip has been extracted to folder projectsdata and deleted.
File aemw-alalakh-idrimi.zip has been extracted to folder projectsdata and deleted.
File aemw-amarna.zip has been extracted to folder projectsdata and deleted.
File akklove.zip has been extracted to folder projectsdata and deleted.
File amgg.zip has been extracted to folder projectsdata and deleted.
File ario.zip has been extracted to folder projectsdata and deleted.
File armep.zip has been extracted to folder projectsdata and deleted.
File arrim.zip has been extracted to folder projectsdata and deleted.
File asbp-ninmed.zip has been extracted t