# Upload data

The following tasks have already been completed. To replicate this analysis with a different service account or project you may need to perform some / all of these tasks. 

- create service account
 - https://cloud.google.com/iam/docs/creating-managing-service-accounts#creating_a_service_account
- create credentials file
 - https://cloud.google.com/storage/docs/authentication#generating-a-private-key
- run script to register service account with Terra..
 - https://github.com/broadinstitute/terra-tools/tree/master/scripts/register_service_account
 - amend relative path to credentials file to absolute path in docker run command provided above. ie..
 - -v /home/mitchac/credentials.json:/svc.json 
- Grant your service account 'writer' permissions on your workspace.
- Upload the credentials.json file for your service account to files area in this workspace.
- Grant your service account 'reader' permissions on the relevant methods in the Broad Methods Repository.

## Copy singlem-wdl git repo

In [22]:
#!git clone https://github.com/wwood/singlem-wdl.git ~/git/singlem-wdl
!cd ~/git/singlem-wdl && git pull --ff-only

remote: Enumerating objects: 25, done.[K
remote: Counting objects: 100% (24/24), done.[K
remote: Compressing objects: 100% (13/13), done.[K
remote: Total 19 (delta 8), reused 15 (delta 4), pack-reused 0[K
Unpacking objects: 100% (19/19), done.
From https://github.com/wwood/singlem-wdl
   894a30b..0ba9b72  main       -> origin/main
Updating 894a30b..0ba9b72
Fast-forward
 autotest/terra-configs/test/SubmissionRequest.json |    11 [32m+[m
 ..._data_table => sra_20210614_2.terra_data_table} |     2 [32m+[m[31m-[m
 ...table => sra_20210614_2_sets1.terra_data_table} | 19350 [32m+++++++++[m[31m----------[m
 3 files changed, 9687 insertions(+), 9676 deletions(-)
 create mode 100644 autotest/terra-configs/test/SubmissionRequest.json
 rename runlists/{sra_20210421_2.terra_data_table => sra_20210614_2.terra_data_table} (99%)
 rename runlists/{sra_20210421_2_sets1.terra_data_table => sra_20210614_2_sets1.terra_data_table} (95%)


In [23]:
# Ensure that the 
import subprocess
current_commit = subprocess.check_output(['bash','-c',"cd ~/git/singlem-wdl && git log --oneline |awk '{print $1}' |head -1"])
if current_commit != b'0ba9b72\n':
    raise Exception("Unexpected git commit found: {}".format(current_commit))

## Copy the credentials.json file to the disk of your notebook instance.. 

In [12]:
!gsutil cp gs://fc-833c2d81-556a-4c83-aed7-21f884f6fec0/notebooks/credentials.json .
!ls

Copying gs://fc-833c2d81-556a-4c83-aed7-21f884f6fec0/notebooks/credentials.json...
/ [1 files][  2.3 KiB/  2.3 KiB]                                                
Operation completed over 1 objects/2.3 KiB.                                      
Cost-estimator-gbp-summary.ipynb  test.ipynb
credentials.json		  Test-singlem-disk-size.ipynb
Run-singlem-workflow.ipynb	  Update-data-ben.ipynb
Run_workflow_on_data_set.ipynb	  Upload-data.ipynb


## Authenticate your service account..

In [13]:
!gcloud auth activate-service-account --key-file credentials.json
!gcloud auth list

Activated service account credentials for: [terra-api@maximal-dynamo-308105.iam.gserviceaccount.com]
                              Credentialed Accounts
ACTIVE  ACCOUNT
        pet-101246808612078416795@firstterrabillingaccount.iam.gserviceaccount.com
*       terra-api@maximal-dynamo-308105.iam.gserviceaccount.com

To set the active account, run:
    $ gcloud config set account `ACCOUNT`



## Install and import

In [14]:
!pip install requests_toolbelt

Collecting requests_toolbelt
  Downloading requests_toolbelt-0.9.1-py2.py3-none-any.whl (54 kB)
[K     |████████████████████████████████| 54 kB 2.4 MB/s eta 0:00:011
[?25hCollecting requests<3.0.0,>=2.0.1
  Downloading requests-2.25.1-py2.py3-none-any.whl (61 kB)
[K     |████████████████████████████████| 61 kB 5.0 MB/s eta 0:00:011
[?25hCollecting idna<3,>=2.5
  Downloading idna-2.10-py2.py3-none-any.whl (58 kB)
[K     |████████████████████████████████| 58 kB 5.0 MB/s  eta 0:00:01
[?25hCollecting certifi>=2017.4.17
  Downloading certifi-2021.5.30-py2.py3-none-any.whl (145 kB)
[K     |████████████████████████████████| 145 kB 8.7 MB/s eta 0:00:01     |█████████                       | 40 kB 14.8 MB/s eta 0:00:01
[?25hCollecting chardet<5,>=3.0.2
  Downloading chardet-4.0.0-py2.py3-none-any.whl (178 kB)
[K     |████████████████████████████████| 178 kB 8.5 MB/s eta 0:00:01
[?25hCollecting urllib3<1.27,>=1.21.1
  Downloading urllib3-1.26.5-py2.py3-none-any.whl (138 kB)
[K     |██

In [15]:
import requests
import os
import json
from requests_toolbelt.multipart.encoder import MultipartEncoder

## Actually do the upload
The only thing you will need to edit in the following code is the filename for your tsv file. NB this command isn't idempotent. ie if you already have entities in your workspace, of the type you are attempting to import, the following command will return a 400 error message. If it works it should return a 200 code. 

In [18]:
import os
home_dir = os.path.expanduser('~')

In [26]:
def upload_tsv(file_path):
    token = os.popen('gcloud auth --account=terra-api@maximal-dynamo-308105.iam.gserviceaccount.com print-access-token').read().rstrip()

    url = 'https://api.firecloud.org/api/workspaces/firstterrabillingaccount/singlem-pilot-2/flexibleImportEntities'

    m = MultipartEncoder(
        fields={"workspaceNamespace": "firstterrabillingaccount","workspaceName": "singlem-pilot-2",
                'entities': ('filename', open(file_path, 'rb'), 'text/plain')}
        )

    head = {'accept': '*/*','Content-Type': m.content_type, 'Authorization': 'Bearer {}'.format(token)}

    r = requests.post(url, data=m, headers=head)
    print(r)

In [27]:
upload_tsv(home_dir+'/git/singlem-wdl/runlists/sra_20210614_2.terra_data_table')

<Response [502]>


In [29]:
!date
upload_tsv(home_dir+'/git/singlem-wdl/runlists/sra_20210614_2_sets1.terra_data_table')

Tue Jun 22 04:51:26 UTC 2021
<Response [502]>


In [31]:
#Despite the 502 responses above, it appeared to work, after waiting some minutes. 
# In future, test for this through the API maybe

The following is a useful function for previewing the message which will be sent to the api. This is not required to upload a data file. 

In [21]:
# def pretty_print_POST(req):
#     """
#     At this point it is completely built and ready
#     to be fired; it is "prepared".

#     However pay attention at the formatting used in 
#     this function because it is programmed to be pretty 
#     printed and may differ from the actual request.
#     """
#     print('{}\n{}\r\n{}\r\n\r\n{}'.format(
#         '-----------START-----------',
#         req.method + ' ' + req.url,
#         '\r\n'.join('{}: {}'.format(k, v) for k, v in req.headers.items()),
#         req.body,
#     ))

# req = requests.Request('POST', url, data=data, headers=head)
# prepared = req.prepare()
# pretty_print_POST(req)