### This notebook is an exemplar which demonstrates transferring zip files between a Box folder and Savio scratch to run OCR on images using Tesseract (inside a Singularity container)

( tested with boxsdk (2.0.0a2) on python 3.5 kernel)
pip install -Iv boxsdk==2.0.0a2 


_This software is available under the terms of the Educational Community License, Version 2.0 (ECL 2.0). This software is Copyright 2016 The Regents of the University of California, Berkeley ("Berkeley")._

The text of the ECL license is reproduced below.

Educational Community License, Version 2.0
*************************************
Copyright 2016 The Regents of the University of California, Berkeley ("Berkeley")

Educational Community License, Version 2.0, April 2007

The Educational Community License version 2.0 ("ECL") consists of the
Apache 2.0 license, modified to change the scope of the patent grant in
section 3 to be specific to the needs of the education communities using
this license. The original Apache 2.0 license can be found at:[http://www.apache.org/licenses/LICENSE-2.0]

### Notebook configuration section
Set of target and source directories, script file names and other used as parameters in processing below.

In [23]:

boxProjectFolder = 'Court Downloads Ayotte Ellias'
boxResultsFolder = 'OCR-zips'
boxFileList = ['7.zip', '80003.zip', '605.zip']

projectname = 'chench_test3_7_80003_605'
runFolder = '/global/scratch/mmanning/chench/'

tesseractimage = '/global/scratch/mmanning/tesseract4.img'
tesseractdatadir = '/opt/tessdata/'
pdfnamelist = []

scratchDataDirectory = '/global/scratch/mmanning/chench/test3/'
tesseractScratchDataDirectory = '/scratch/'

SINGULARITYCMD = 'singularity exec -B /global/scratch/mmanning/chench/test3/:/scratch/  /global/scratch/mmanning/tesseract4.img'

gsCommandScript = runFolder + 'gsCommandScript.sh'
t4CommandScript = runFolder + 't4CommandScript.sh'
slurmScript = runFolder + 'slurmscript.sh'



### Box Authorization

function to store the oauth2 refresh token in a local file. This can be modified to use a keychain or other as required.

In [24]:
def store_tokens(access_token, refresh_token):
    
    """Callback for storing refresh tokens. (For now we ignore access tokens)."""
    with open('apptoken.cfg', 'w') as f:
     f.write(refresh_token.strip())

Oauth2 information is read from a local file with three lines, one line per parameter. 
The client id and client secret are defined in the Box application created for this notebook.  Create the application at the Box Developers site: https://berkeley.app.box.com/developers/services/edit/

The redirect uri can be any site that requires validation. Run the bootstrap notebook to create initial 
tokens that are then continually refreshed

In [25]:
import os

CLIENT_ID = None
CLIENT_SECRET = None
REDIRECT_URI = None

# folder where box token config file resides
os.chdir('/global/home/users/mmanning')


# Read app info from text file
with open('app.cfg', 'r') as app_cfg:
    CLIENT_ID = app_cfg.readline()
    CLIENT_SECRET = app_cfg.readline()
    REDIRECT_URI = app_cfg.readline()

The refresh token is read from a local file. This token was created by running the bootstrap notebook which requires the user to validate with CalNet Authentication Service credentials, then stores the returned auth and refresh tokens in the same config files.

In [26]:
REFRESH_TOKEN = None

# Read app info from text file
with open('apptoken.cfg', 'r') as apptoken_cfg:
    REFRESH_TOKEN = apptoken_cfg.readline()

__Perform autentication__ 
then create globus client
Verify client is working by retrieving the name of the users root folder in Box

In [27]:
from boxsdk import OAuth2
from boxsdk import Client

# Do OAuth2 authorization.
oauth = OAuth2(
    client_id=CLIENT_ID.strip(),
    client_secret=CLIENT_SECRET.strip(),
    refresh_token=REFRESH_TOKEN.strip(),
    store_tokens=store_tokens
)

client = Client(oauth)

root_folder = client.folder(folder_id='0').get()
print ("folder name: ", root_folder['name'] )

items = client.folder(folder_id='0').get_items(limit=100, offset=0)
#print ("items: ", items )

folder name:  All Files


### Utility functions

__function to find folder id be folder name.__  
Current SDK does not have a 'find by name' function so must loop thru all folders and look for match.

In [67]:
def find_folder_id(folder_name):
    folderlist = client.search(query=folder_name, result_type='folder', limit=10, offset=0)
    
    if len(folderlist) == 0 or len(folderlist) > 1:
        print('folder not found: ', folder_name)
        return 0
    else:
        return folderlist[0]['id']

In [68]:
import re

def natural_sort_key(s, _nsre=re.compile('([0-9]+)')):
    return [int(text) if text.isdigit() else text.lower()
            for text in re.split(_nsre, s)]  

__function to return all files in directory tree.__

In [58]:
from scandir import scandir
import os
def scantreeForFiles(path):
    """Recursively yield DirEntry objects for given directory."""
    for entry in scandir(path):
        if entry.is_dir(follow_symlinks=False):
            yield from scantree(entry.path) 
        else:
            yield entry.path


__function to return list of all folders in directory tree.__

In [59]:
from scandir import scandir
import os
def scandirForFolders(path, dirlist):
    """Recursively yield DirEntry objects for given directory."""
    for entry in scandir(path):
        if entry.is_dir(follow_symlinks=False):
            dirlist.append(entry.path)
            scandirForFolders(entry.path, dirlist)    


__Validate all the task log files produced by ht_helper __

In [28]:
def validateTaskResults(fileroot, totalTasks):
    # file root is job-name.jobId.taskNumber.log
    
    errorList = []
    
    for i in range(0, totalTasks-1):
        fn = fileroot + '.' + str(i)
        if os.path.exists(fn):
            out = !tail -1 {fn}
            retval = out[0]
            #print ('return code: ', out[0])
        else:
            print ('warning: log file not available: ', fn)
        
        if ( retval != '0' ):
            errorList.append(i)
            
    return errorList


### Retrieve the file(s)  from the Box folder.
currently the Box SDK does not have an option for finding a folder by name so if you are looking for a specific folder then you would need to loop thru all the items in the list below and do a name match. Once you find the folder and retrieve the id, you can save that id for subsequent runs. Another option is to get the id from the url in the web client, but approah below is more flexible for now.

In [12]:
import os
import shutil 


os.chdir(scratchDataDirectory)
print ('current working directory: ', os.getcwd())

# test folder contents
items = client.folder(folder_id='0').get_items(limit=20, offset=0)
if type(items) is list:
    print ('number of files in top folder: ', len(items) )
    
    targetfolderId = ''
    for item in items:
        if item['type'] == 'folder':
            print('folder name: ', item['name'])
            if item['name'] == boxProjectFolder:
                targetfolderId = item['id']
                print('targetfolderId: ', targetfolderId)
        
    if targetfolderId is not None:
        tgtitems = client.folder(folder_id=targetfolderId).get_items(limit=200, offset=0)
        if type(tgtitems) is list:
            print ('number of files in target folder: ', len(tgtitems) ) 
        
        # download files
        for tgtitem in tgtitems:
            if  not tgtitem['type'] == 'folder' and tgtitem['name'] in boxFileList:
                print('downloading: ', tgtitem['name'])
                newfile = open(scratchDataDirectory + tgtitem['name'], 'wb')
                client.file(file_id=tgtitem['id']).download_to(newfile)
                newfile.close()
        print('downloading completed. ')
        

current working directory:  /global/scratch/mmanning/chench/test3
number of files in top folder:  20
folder name:  401 gen 24 feb
folder name:  actest
folder name:  actest2
folder name:  actest_401_gen_24_feb
folder name:  backupTest
folder name:  cista Santa ISa
folder name:  Court Downloads Ayotte Ellias
targetfolderId:  18727240276
folder name:  HearstWAVE
folder name:  latin_1603
folder name:  photoscandemo
folder name:  sdktest
folder name:  tassie
folder name:  TesseractExperiment
folder name:  TesseractNotebook
folder name:  test for Photoscan
folder name:  test77
folder name:  ThisIsATest
folder name:  UCLA_Demo_23Feb
number of files in target folder:  189
downloading:  605.zip
downloading:  7.zip
downloading:  80003.zip
downloading completed. 


__unzip the files__

In [13]:
import zipfile
def unzip(source_filename, dest_dir):
    with zipfile.ZipFile(source_filename) as zf:
        print('extractall: ', source_filename)
        zf.extractall(dest_dir)
    print('extractall completed. ')

In [14]:
import glob

for filename in glob.glob('*.zip'):
    print('unzip: ', filename)
    unzip(filename, scratchDataDirectory)
    #remove the zip file
    os.remove(filename)
print('zip processing completed. ')

unzip:  7.zip
extractall:  7.zip
extractall completed. 
unzip:  605.zip
extractall:  605.zip
extractall completed. 
unzip:  80003.zip
extractall:  80003.zip
extractall completed. 
zip processing completed. 


__SLURM job script__ normal

In [18]:
# batch script
batchtemplate = '#!/bin/bash -l  \n\
# Job name: \n\
#SBATCH --job-name=' + projectname + '\n\
# \n\
# Account: \n\
#SBATCH --account=ac_scsguest \n\
# \n\
# Partition: \n\
#SBATCH --partition=savio2 \n\
# \n\
## Scale by increasing the number of nodes \n\
#SBATCH --nodes=5  \n\
## DO NOT change ntasks-per-node setting as T4 also distributes across cores \n\
#SBATCH --ntasks-per-node=6 \n\
#SBATCH --qos=savio_normal \n\
# \n\
# Wall clock limit: \n\
#SBATCH --time={} \n\
# \n\
## Command(s) to run: \n\
module load gcc openmpi  \n\
/global/home/groups/allhands/bin/ht_helper.sh  -t {} -n1 -s1 -vL \n' 


__Remove special characters from filenames__  
not sure what characters to include here

In [63]:
import re
import os
for entry in scantreeForFiles(scratchDataDirectory):
    filename, file_extension = os.path.splitext(entry)
    if ( entry.endswith('.pdf') and re.search('\$', entry)):
        print ('sprcial characters in filename: ', entry)
        os.rename(entry, re.sub("[\$]", "", entry))

### Create script to convert all pdf files in working directory to images


__need to handle dollar signs in filenames here (grumble, grumble...)__

In [17]:
import glob, os
import shutil 

# Ghostscript executable is inside the container.
# TEMPLATE: gs -dBATCH -dNOPAUSE -dQUIET -sDEVICE=png16m -sOutputFile=/scratch/test/output/test-%d.png -r300 /scratch/test/germanocr.pdf
SINGULARITYCMD = 'singularity exec -B {}:/scratch/ /global/scratch/mmanning/tesseract4.img ' 
GHOSTSCRIPTCMD = 'gs -dBATCH -dNOPAUSE -dQUIET -sDEVICE=png16m -sOutputFile=\"{}-%d.png\" -r300 \"{}\" ;  echo $?'

os.chdir(scratchDataDirectory)
print ('current working directory: ', os.getcwd())

scmd = SINGULARITYCMD.format(scratchDataDirectory)

# total number of ghostscript tasks
gsCommandTotal = 0

with open(gsCommandScript, 'w') as f:  

    for entry in scantreeForFiles(scratchDataDirectory):
        filename, file_extension = os.path.splitext(entry)
        if ( entry.endswith('.pdf')):
            relativepath1 = entry[len(scratchDataDirectory):]
            relativepath2 = filename[len(scratchDataDirectory):]
            gcmd = GHOSTSCRIPTCMD.format(tesseractScratchDataDirectory+relativepath2, tesseractScratchDataDirectory+relativepath1 )
            f.write(scmd + gcmd + '\n')
            gsCommandTotal += 1
    
    
#set time limit for this batch run
outputbatchscript = batchtemplate.format('04:30:00',  gsCommandScript)
with open(slurmScript, 'w') as f:  
    f.write(outputbatchscript)

current working directory:  /global/scratch/mmanning/chench/test3


__Execute the task script with ht_helper__

In [4]:
os.chdir(runFolder)
print ('current working directory: ', os.getcwd())

out = !sbatch slurmscript.sh   
    
print ('Execute ghostscript output: ', out ) 
jobId =  out[0].split()[3]
print (jobId)

current working directory:  /global/scratch/mmanning/chench
Execute ghostscript output:  ['Submitted batch job 1339343']
1339343


In [5]:
# print the users queue and the job status by id
!squeue -u $username
print('--------------------------------')
!scontrol show job $jobId

squeue: option requires an argument -- 'u'
Try "squeue --help" for more information
--------------------------------
JobId=1339343 JobName=chench_test3_7_80003_605
   UserId=mmanning(42702) GroupId=ucb(501) MCS_label=N/A
   Priority=1001 Nice=0 Account=ac_scsguest QOS=savio_normal
   JobState=FAILED Reason=NonZeroExitCode Dependency=(null)
   Requeue=0 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=1:0
   RunTime=00:00:01 TimeLimit=04:30:00 TimeMin=N/A
   SubmitTime=2017-05-23T10:39:21 EligibleTime=2017-05-23T10:39:21
   StartTime=2017-05-23T10:39:23 EndTime=2017-05-23T10:39:24 Deadline=N/A
   PreemptTime=None SuspendTime=None SecsPreSuspend=0
   Partition=savio2 AllocNode:Sid=ln000:94408
   ReqNodeList=(null) ExcNodeList=(null)
   NodeList=n0132.savio2,n0152.savio2,n0162.savio2,n0183.savio2,n0186.savio2
   BatchHost=n0132.savio2
   NumNodes=5 NumCPUs=120 NumTasks=30 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
   TRES=cpu=120,mem=312.50G,node=5
   Socks/Node=* NtasksPerN:B:S:C=6:0:*:* CoreSpec=*
   MinCP

__Check all task log files for bad exit code__  
task numbers align with lines in the task script  
check the log file of tasks in the returned array of failures  

In [16]:
import glob, os
print ('current working directory: ', os.getcwd())

jobId = '1339347'
gsCommandTotal = 2761

fileroot = projectname + '.' + jobId + '.log'
tasklist = validateTaskResults(fileroot, gsCommandTotal)
print ('these tasks in task script failed: ', tasklist)


current working directory:  /global/scratch/mmanning/chench
these tasks in task script failed:  [467, 479, 481]


__Remove task logs after any errors have been resolved__

In [19]:
 
filter = fileroot + '*'
print ('filter: ', filter)
#for f in glob.glob(filter):
#    os.remove(f)

filter:  chench_test3_7_80003_605.1339347.log*


### Create script to ocr all png files in working directory to text

In [20]:
import glob, os
os.chdir(scratchDataDirectory)
print ('current working directory: ', os.getcwd())
# template: tesseract --tessdata-dir /opt/tessdata /scratch/germanocr_Page_01.png  germanout  -l deu
TCMD = '  tesseract --tessdata-dir /opt/tessdata \"{}\" \"{}\"  -l eng;  echo $?'
#

scmd = SINGULARITYCMD.format(scratchDataDirectory)
# total number of tesseract tasks
t4CommandTotal = 0

with open(t4CommandScript, 'w') as f:

    for entry in scantreeForFiles(scratchDataDirectory):
        if ( entry.endswith('.png')):
            filename, file_extension = os.path.splitext(entry)
            relativepath1 = entry[len(scratchDataDirectory):]
            relativepath2 = filename[len(scratchDataDirectory):]
            tcmd = TCMD.format(tesseractScratchDataDirectory+relativepath1, tesseractScratchDataDirectory+relativepath2 )
            #print(scmd + tcmd)
            f.write(scmd + tcmd + '\n')
            t4CommandTotal += 1
    
    
#set time limit for this batch run
outputbatchscript = batchtemplate.format('15:00:00',  t4CommandScript)
with open(slurmScript, 'w') as f:  
    f.write(outputbatchscript)

current working directory:  /global/scratch/mmanning/chench/test3


__Execute the task script with ht_helper__

In [21]:
os.chdir(runFolder)
print ('current working directory: ', os.getcwd())

out = !sbatch slurmscript.sh   
    
print ('Execute tesseract4 output: ', out ) 
jobId =  out[0].split()[3]
print (jobId)

current working directory:  /global/scratch/mmanning/chench
Execute tesseract4 output:  ['Submitted batch job 1339651']
1339651


In [22]:
# print the users queue and the job status by id
!squeue -u $username
print('--------------------------------')
!scontrol show job $jobId

squeue: option requires an argument -- 'u'
Try "squeue --help" for more information
--------------------------------
JobId=1339651 JobName=chench_test3_7_80003_605
   UserId=mmanning(42702) GroupId=ucb(501) MCS_label=N/A
   Priority=1000 Nice=0 Account=ac_scsguest QOS=savio_normal
   JobState=RUNNING Reason=None Dependency=(null)
   Requeue=0 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
   RunTime=00:00:22 TimeLimit=15:00:00 TimeMin=N/A
   SubmitTime=2017-05-23T14:30:44 EligibleTime=2017-05-23T14:30:44
   StartTime=2017-05-23T14:30:48 EndTime=2017-05-24T05:30:48 Deadline=N/A
   PreemptTime=None SuspendTime=None SecsPreSuspend=0
   Partition=savio2 AllocNode:Sid=ln000:94408
   ReqNodeList=(null) ExcNodeList=(null)
   NodeList=n0086.savio2,n0087.savio2,n0088.savio2,n0091.savio2,n0101.savio2
   BatchHost=n0086.savio2
   NumNodes=5 NumCPUs=120 NumTasks=30 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
   TRES=cpu=120,mem=312.50G,node=5
   Socks/Node=* NtasksPerN:B:S:C=6:0:*:* CoreSpec=*
   MinCPUsNode=6 M

__Check all task log files for bad exit code__

In [30]:

os.chdir(runFolder)
print ('current working directory: ', os.getcwd())

fileroot = projectname + '.' + jobId + '.log'
#tasklist = validateTaskResults(fileroot, 10) first check a small subset
tasklist = validateTaskResults(fileroot, t4CommandTotal)
print ('these tasks in task script failed: ', tasklist)

# Remove task logs
#filter = fileroot + '*'
#for f in glob.glob(filter):
#    os.remove(f)

current working directory:  /global/scratch/mmanning/chench
these tasks in task script failed:  []


### Merge text files and upload to Box

In [64]:
from scandir import scandir
dirlist = []

scandirForFolders(scratchDataDirectory, dirlist)

print("num dirs: ", len(dirlist) ) 

for x in range(0, 10): 
    print("dir: ", dirlist[x] ) 


num dirs:  2075
dir:  /global/scratch/mmanning/chench/test3/SDNY
dir:  /global/scratch/mmanning/chench/test3/SDNY/605
dir:  /global/scratch/mmanning/chench/test3/SDNY/605/226
dir:  /global/scratch/mmanning/chench/test3/SDNY/605/49
dir:  /global/scratch/mmanning/chench/test3/SDNY/605/381
dir:  /global/scratch/mmanning/chench/test3/SDNY/605/143
dir:  /global/scratch/mmanning/chench/test3/SDNY/605/417
dir:  /global/scratch/mmanning/chench/test3/SDNY/605/132
dir:  /global/scratch/mmanning/chench/test3/SDNY/605/393
dir:  /global/scratch/mmanning/chench/test3/SDNY/605/369


__check that for every .png there is a .txt in each directory__

In [65]:
missingResultList = []
for currentdir in dirlist:
    os.chdir(currentdir)
    #print ('current working directory: ', os.getcwd())
    
    
    # get a list of all pdf names
    for filename in os.listdir(os.getcwd()):
        if  os.path.isfile(filename)  and filename.endswith('.png'):
            fn, fe = os.path.splitext(filename)
            if not os.path.exists(fn + '.txt'):
                missingResultList.append(currentdir + '/' + filename)
                print ('missing result: ', currentdir + '/' + filename )
print("missingResultList size: ", len(missingResultList) ) 

missingResultList size:  0


__merge text files into original documents__

In [69]:
from shutil import copyfile

for currentdir in dirlist:
    os.chdir(currentdir)
    #print ('current working directory: ', os.getcwd())
    pdfnamelist = []
    
    # get a list of all pdf names
    for filename in os.listdir(os.getcwd()):
        if  os.path.isfile(filename)  and filename.endswith(".pdf"):
            #print("filename: ", filename ) 
            fn, fe = os.path.splitext(filename)
            pdfnamelist.append(fn)
    #print("pdfnamelist size: ", len(pdfnamelist) ) 
    
    for name in pdfnamelist:
        mergeList = []
        for filename in os.listdir('.'):
            
            if filename.endswith(".txt") and filename.startswith(name): 
                #print("filename: ", filename)
                mergeList.append(filename)
                
        #print('mergeList: ', mergeList)
        alltextfilename = ''.join([ currentdir,"/",name,'_ALL.txt'])
        
        if (len(mergeList) > 1):
            sortedList = sorted(mergeList, key = natural_sort_key)
            print('sortedList: ', sortedList)

            alltextfilename = ''.join([ currentdir,"/",name,'_ALL.txt'])
            with open(alltextfilename, 'w', encoding="utf-8") as outfile:
                for fname in sortedList:
                    with open(''.join([currentdir,"/", fname]), encoding="utf-8" ) as infile:
                        for line in infile:
                            outfile.write(line)
        elif (len(mergeList) == 1):
            # if file is only one page, just copy to _ALL.txt so it is included in results
            print('single file: ', mergeList[0])
            copyfile(mergeList[0], alltextfilename)
        else: 
            print('empty mergeList on file: ', name)        


sortedList:  ['Main Document-1.txt', 'Main Document-2.txt']
sortedList:  ['Main Document-1.txt', 'Main Document-2.txt', 'Main Document-3.txt', 'Main Document-4.txt', 'Main Document-5.txt', 'Main Document-6.txt', 'Main Document-7.txt']
single file:  Main Document-1.txt
sortedList:  ['Main Document-1.txt', 'Main Document-2.txt', 'Main Document-3.txt', 'Main Document-4.txt', 'Main Document-5.txt', 'Main Document-6.txt']
sortedList:  ['Main Document-1.txt', 'Main Document-2.txt']
single file:  Main Document-1.txt
sortedList:  ['Main Document-1.txt', 'Main Document-2.txt', 'Main Document-3.txt', 'Main Document-4.txt', 'Main Document-5.txt', 'Main Document-6.txt', 'Main Document-7.txt', 'Main Document-8.txt', 'Main Document-9.txt', 'Main Document-10.txt', 'Main Document-11.txt', 'Main Document-12.txt', 'Main Document-13.txt', 'Main Document-14.txt', 'Main Document-15.txt', 'Main Document-16.txt', 'Main Document-17.txt', 'Main Document-18.txt', 'Main Document-19.txt', 'Main Document-20.txt', 

__verify counts__

In [71]:
os.chdir(scratchDataDirectory)
print("number of pdfs in set: " ) 

!find . -name "*.pdf" | wc -l

print("number of merged text files in set: " ) 

!find . -name "*_ALL.txt" | wc -l


number of pdfs in set: 
2761
number of merged text files in set: 
2758


In [72]:
print("num dirs: ", len(dirlist) ) 

for currentdir in dirlist:
    os.chdir(currentdir)
    print ('current working directory: ', os.getcwd())
    
    # remove all pdf and png files
    for currentFile in os.listdir(os.getcwd()):
        if os.path.isfile(currentFile) and not currentFile.endswith('_ALL.txt'):
                os.remove(os.path.join(currentdir, currentFile))
    
    for currentFile in os.listdir(os.getcwd()):
        if os.path.isfile(currentFile) :
            newname = currentFile.replace('_ALL.txt', '.txt')
            os.rename(currentFile, newname)
    

num dirs:  2075
current working directory:  /global/scratch/mmanning/chench/test3/SDNY
current working directory:  /global/scratch/mmanning/chench/test3/SDNY/605
current working directory:  /global/scratch/mmanning/chench/test3/SDNY/605/226
current working directory:  /global/scratch/mmanning/chench/test3/SDNY/605/49
current working directory:  /global/scratch/mmanning/chench/test3/SDNY/605/381
current working directory:  /global/scratch/mmanning/chench/test3/SDNY/605/143
current working directory:  /global/scratch/mmanning/chench/test3/SDNY/605/417
current working directory:  /global/scratch/mmanning/chench/test3/SDNY/605/132
current working directory:  /global/scratch/mmanning/chench/test3/SDNY/605/393
current working directory:  /global/scratch/mmanning/chench/test3/SDNY/605/369
current working directory:  /global/scratch/mmanning/chench/test3/SDNY/605/256
current working directory:  /global/scratch/mmanning/chench/test3/SDNY/605/139
current working directory:  /global/scratch/mmann

In [18]:
import shutil

os.chdir(scratchDataDirectory)
print ('current working directory: ', os.getcwd())
shutil.make_archive(projectname, 'zip', scratchDataDirectory)

print('completed zip: ', os.stat(projectname + '.zip'))
    

current working directory:  /global/scratch/mmanning/chenchSAVTXT2
completed zip:  os.stat_result(st_mode=33188, st_ino=144132016428936861, st_dev=4079531836, st_nlink=1, st_uid=42702, st_gid=501, st_size=1197537195, st_atime=1494447820, st_mtime=1494447950, st_ctime=1494447950)


#### Move the resulting zip file to Box.

In [35]:
#folderId = find_folder_id(boxProjectFolder)
folderId = find_folder_id('ThisIsATest')
print ("folderId: ", folderId )
upload_folder = client.folder(folder_id=folderId).get()
objUploaded = upload_folder.upload(scratchDataDirectory + projectname + '.zip')  
print ("obj file id: ", objUploaded['id'] )

folderId:  20926572129


ConnectionError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))