# Google Cloud Storage for SOM
This is a quick demo of functions under development to upload Stanford Medicine Radiology data, specifically text and images, to Google Cloud Storage. Our storage strategy is as follows:

- metadata goes into [Datastore](https://cloud.google.com/datastore/)
- image and other objects go into [Storage](https://cloud.google.com/storage/)

The links to images in storage are found via querying DataStore, and both databases can be connected to various cloud applications, exported to BigQuery to run a small query (harhar), and accessed via APIs we will develop.

## The Radiology Client
This quick demo shows an instance of a `som.storage.google.client`, which has high level `ModelBase` and `BatchManager` classes to represent entities (collection, image, etc.). The batch manager is simply a way to hold buckets of these things and perform operations in transactions. This demo will show lines from a few functions that are particular to the radiology client.

First we will import two functions. The first is our radiology client, and the second will take a zipped up collection folder (images and text) and convert it to a data structure that we know how to parse.

In [1]:
from som.storage.google.radiology import Client
from som.wordfish.structures import structure_dataset

Environment message level found to be DEBUG


We now want to convert our zipped cookie patients into a data structure to hand to the client.

In [2]:
compressed_data = '../../../wordfish-standard/demo/cookies.zip'
structures = structure_dataset(compressed_data,clean_up=False)

INFO:som:collections found: 1
INFO:som:collection cookies does not have metadata file cookies.json
INFO:som:Found 112 entity folders in collection.
INFO:som:entity 1fb162c5-b465-42d9-a135-9d190d159dfe does not have metadata file 1fb162c5-b465-42d9-a135-9d190d159dfe.json
INFO:som:entity 1fb162c5-b465-42d9-a135-9d190d159dfe has 1 text
INFO:som:entity 1fb162c5-b465-42d9-a135-9d190d159dfe has 2 images
INFO:som:entity 47483111-4aa1-42cf-805f-b4e79857a9dc does not have metadata file 47483111-4aa1-42cf-805f-b4e79857a9dc.json
INFO:som:entity 47483111-4aa1-42cf-805f-b4e79857a9dc does not have text.
INFO:som:entity 47483111-4aa1-42cf-805f-b4e79857a9dc has 2 images
INFO:som:entity b815b02c-1ed2-4efc-b809-76c006d20f7c does not have metadata file b815b02c-1ed2-4efc-b809-76c006d20f7c.json
INFO:som:entity b815b02c-1ed2-4efc-b809-76c006d20f7c does not have text.
INFO:som:entity b815b02c-1ed2-4efc-b809-76c006d20f7c has 2 images
INFO:som:entity fe1c1509-ac12-42f2-a450-1884466a02de does not have metadata

What you see above is the [wordfish module](https://github.com/radinformatics/som-tools/tree/master/som/wordfish) opening up the zip, parsing through it, and extracting information about the collection, entities within it, and images/text/entities that the entities have. At the end you see that we have found 112 valid entities. Part of generating the data structures includes validation so we identify buggy / broken data from the start. Next, let's make a client.

## The Client
The base client holds handles to each of our databases, storage and datastore, along with functions to get/update/delete things. They are coupled in this way because we always want the datasore to be paired with storage. If an object is updated in storage, we want it to be updated in datastore as well. To make it, we just do this:

      radiology_client = Client()
      INFO:googleapiclient.discovery:URL being requested: GET https://www.googleapis.com/storage/v1/b/radiology?alt=json
      ...
      INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
      INFO:oauth2client.client:Refreshing access_token

I've typed this in manually with some content removed, because I'm not sure the extent to which the printed string of token stuffs could be leaking secrets. In this call we see two things happen. The first is asking for a connection to radiology storage, and the second to datastore, and our access token is refreshed.

Now we are going to cheat a little - we are going to copy lines from within the

      radiology_client.upload_dataset()
      
      
function, which would be called like:

      radiology_client.upload_dataset(structures)

to get images into storage, and metadata into datastore. Let's first look at what the structures look like. Structures is a list, with each item being a collection (corresponding to some logical grouping of entities with images and text). A zipped up file having more than one folder at the base would indicate more than one collection. Let's look at `structures[0]` corresponding to the `cookies` collection:

In [4]:
structure = structures[0]
structure

{'collection': {'entities': [{'entity': {'id': '/tmp/tmpao9qbr48/z98k9dn8/cookies/1fb162c5-b465-42d9-a135-9d190d159dfe',
     'images': [{'metadata': '/tmp/tmpao9qbr48/z98k9dn8/cookies/1fb162c5-b465-42d9-a135-9d190d159dfe/images/image1/overlay1.json',
       'original': '/tmp/tmpao9qbr48/z98k9dn8/cookies/1fb162c5-b465-42d9-a135-9d190d159dfe/images/image1/overlay1.png'},
      {'original': '/tmp/tmpao9qbr48/z98k9dn8/cookies/1fb162c5-b465-42d9-a135-9d190d159dfe/images/image1/image1.png'}],
     'texts': [{'original': '/tmp/tmpao9qbr48/z98k9dn8/cookies/1fb162c5-b465-42d9-a135-9d190d159dfe/text/description1/description1.txt'}]}},
   {'entity': {'id': '/tmp/tmpao9qbr48/z98k9dn8/cookies/47483111-4aa1-42cf-805f-b4e79857a9dc',
     'images': [{'metadata': '/tmp/tmpao9qbr48/z98k9dn8/cookies/47483111-4aa1-42cf-805f-b4e79857a9dc/images/image1/overlay1.json',
       'original': '/tmp/tmpao9qbr48/z98k9dn8/cookies/47483111-4aa1-42cf-805f-b4e79857a9dc/images/image1/overlay1.png'},
      {'original': 

Let's first create the collection, which is just an entry of metadata into datastore. 

In [9]:
import os
collection_name = os.path.basename(structure['collection']['name'])

# Put all stuff to add into a fields dictionary
fields = {'uid':collection_name}

# Create the collection (this actually calls get_or_create)
col = radiology_client.get_collection(fields)

# Here is our Collection Model
print('Here is the collection model.')
print(col)

# Here is the datastore object
print('Here is the datastore object')
print(col.this)

Here is the collection model.
Collection:cookies
Here is the datastore object
<Entity('Collection', 'cookies') {'updated': datetime.datetime(2017, 3, 11, 22, 1, 24, 824523), 'name': 'cookies', 'created': datetime.datetime(2017, 3, 9, 22, 51, 25, 335335, tzinfo=<UTC>), 'uid': 'cookies'}>


Now we are going to create an entity, which is a different object, but it has the collection we just created as a parent. Normally a key for something looks like this:

       ('Kind','unique_id')
       
so for the collection we have

       ('Collection','cookies')
       
and you can think about relationships like a folder structure. A child of the Collection cookies is just referenced after it:

       ('Parent','unique_id','Child','child_id')

so our entity will look like this:

       ('Collection','cookies','Entity','entity_id')

but we could also reference it on its own:

       ('Entity','entity_id')

and this same idea extends to when we want to add images, text, etc.

       ('Collection','cookies','Entity','entity_id','Image','image_id')
       
So let's add an entity.

In [10]:
contender = structure['collection']['entities'][0]
print(contender)

{'entity': {'texts': [{'original': '/tmp/tmpao9qbr48/z98k9dn8/cookies/1fb162c5-b465-42d9-a135-9d190d159dfe/text/description1/description1.txt'}], 'images': [{'original': '/tmp/tmpao9qbr48/z98k9dn8/cookies/1fb162c5-b465-42d9-a135-9d190d159dfe/images/image1/overlay1.png', 'metadata': '/tmp/tmpao9qbr48/z98k9dn8/cookies/1fb162c5-b465-42d9-a135-9d190d159dfe/images/image1/overlay1.json'}, {'original': '/tmp/tmpao9qbr48/z98k9dn8/cookies/1fb162c5-b465-42d9-a135-9d190d159dfe/images/image1/image1.png'}], 'id': '/tmp/tmpao9qbr48/z98k9dn8/cookies/1fb162c5-b465-42d9-a135-9d190d159dfe'}}


In [12]:
fields = {'uid': os.path.basename(contender['entity']['id']),
          'collection':col }
entity = radiology_client.get_entity(fields)

# Here is OUR Entity model:
print(entity)

# And here is the datastore entity model, which just happens to call it's general model an Entity too:
print(entity.this)

Collection:cookies/Entity:1fb162c5-b465-42d9-a135-9d190d159dfe
<Entity('Collection', 'cookies', 'Entity', '1fb162c5-b465-42d9-a135-9d190d159dfe') {'updated': datetime.datetime(2017, 3, 11, 22, 6, 27, 84864), 'created': datetime.datetime(2017, 3, 11, 21, 20, 11, 643976, tzinfo=<UTC>), 'uid': '1fb162c5-b465-42d9-a135-9d190d159dfe'}>


That's all I've done thus far, these creation functions don't show adding files to storage because that comes with images and text, but you can imagine this same workflow but also adding objects to storage, and putting their links as fields with the respective datastore model. I will add to this demo when I fine tune that a bit more.