# Data Transfer - Goes16

This notebook helps outline a transfer process for managing a data transfer from RCC's Minio Server to the OSDC Griffin Ceph storage.   It relies on using AWS cli as a python subprocess.   This is necessary (instead of say - boto) because OSDC s3 utilizes a version of Ceph that necessitates awsv2 signitures.   

We rely on awscli < 1.09 to manage this issue and connect to both resources. 

`pip install "awscli>=1.0.0,<=1.09.0"`

Otherwise the goal here is to create scripts to transfer event data for Hurricanes Irma, Maria, and Jose (all >= category 4) to OSDC Griffin to make these easier for the community to use/study.  Our Griffin credentials have been loaded into a google compute VM as the default AWS profile, thus, no explicit `--profile {myprofile}` calls.

The data is at: https://griffin-objstore.opensciencedatacloud.org/noaa-goes16-hurricane-archive-2017/

For more information visit:  http://edc.occ-data.org/

In [None]:
import subprocess, os, shutil

# Create Bucket

## Delete If Doesn't Exist

In [None]:
deletebucket = 'aws s3 rb s3://noaa-goes16-hurricane-archive-2017/ --endpoint-url https://griffin-objstore.opensciencedatacloud.org --force'
#deleteB=subprocess.Popen(deletebucket, shell=True, stdout = subprocess.PIPE)

##  Make Bucket

In [None]:
makebucket = 'aws s3 mb s3://noaa-goes16-hurricane-archive-2017/ --endpoint-url https://griffin-objstore.opensciencedatacloud.org'
makeB=subprocess.Popen(makebucket, shell=True, stdout = subprocess.PIPE)

# Data Transfer

## Set Global Vars

In [None]:
folder = '/home/wells_walt/Hurricane'
localdir = folder + '/.'
sensorlist = ['ABI-L1b-RadC', 'ABI-L1b-RadF', 'ABI-L1b-RadM', 'ABI-L2-CMIPC', 
              'ABI-L2-CMIPF', 'ABI-L2-CMIPM', 'ABI-L2-MCMIPC', 'ABI-L2-MCMIPF',
              'ABI-L2-MCMIPM']

## Get Date Ranges for Hurricanes of Interest

We're picking the 3 major hurricanes in 2017 >= category 4.  Since there is a great deal of overlap, we will put them into a single bucket.

In [None]:
irma_range = [242, 259] #Formed: August 30, 2017; Dissipated: September 16, 2017
maria_range = [259, 276]  #Formed: September 16, 2017; Dissipated: October 3, 2017
jose_range = [248, 269] #Formed: September 5, 2017; Dissipated: September 26, 2017

Here we'll grab all the data in this range for Irma, Maria, Jose. 

In [None]:
all_range = list(range(242, 276+1))

## Helper:  Delete Pulled Data

In [None]:
def cleanhouse(folder):
    # https://stackoverflow.com/questions/185936/how-to-delete-the-contents-of-a-folder-in-python
    for the_file in os.listdir(folder):
        file_path = os.path.join(folder, the_file)
        try:
            if os.path.isfile(file_path):
                os.unlink(file_path)
            elif os.path.isdir(file_path): shutil.rmtree(file_path)
        except Exception as e:
            print(e)

## Helper:  Push and Pull

In [None]:
def pulldata(sensor, day): 
    rccloc = 's3://noaa-goes16/' + sensor + '/2017/' + str(day) + '/ '
    pullcmd = 'aws s3 cp ' + rccloc + localdir + ' --no-sign-request --endpoint-url https://osdc.rcc.uchicago.edu --no-verify-ssl --recursive'
    pull=subprocess.Popen(pullcmd, shell=True, stdout = subprocess.PIPE)
    print(pullcmd)
    
def pushdata(sensor, day):
    bucketloc = ' s3://noaa-goes16-hurricane-archive-2017/' + sensor + '/' + str(day) + '/'
    pushcmd = 'aws s3 cp ' + localdir + bucketloc + ' --endpoint-url https://griffin-objstore.opensciencedatacloud.org --acl public-read --recursive'
    push=subprocess.Popen(pushcmd, shell=True, stdout = subprocess.PIPE)
    print(pushcmd)

In [None]:
#Test Smaller Range
#sensorlist = ['ABI-L1b-RadC', 'ABI-L1b-RadF']
#all_range = list(range(242, 243+1))

## Do the thing

In [None]:
for sensor in sensorlist:
    for day in all_range:
        pulldata(sensor, day)
        pushdata(sensor, day)
        cleanhouse(folder)

# Make Bucket Public

In [None]:
makepublic = 'aws s3api put-bucket-acl --endpoint-url https://griffin-objstore.opensciencedatacloud.org --bucket noaa-goes16-hurricane-archive-2017 --acl public-read'
makePub=subprocess.Popen(makepublic, shell=True, stdout = subprocess.PIPE)

In [None]:
cleanhouse(folder)