# Working with Data

The intent of this tutorial is to help familiarize yourself with browsing for data that will be used along with an application to generate data by submitting a job. Job submission will be covered in the next tutorial. Run each cell in order (shift-enter). The notes will indicate when you need to edit code to customize things (e.g., to indicate a data collection)vs. being prompted by running the cell (e.g. for your username and password).

In [None]:
import requests
import getpass
import json
from IPython.display import JSON

First we need some pre-defined environment variables

In [None]:
# This portion of the code is env specific for Dev, Test, Ops, etc. 
# define the environment as our test venue
env = {
    "clientId":"71894molftjtie4dvvkbjeard0",
    "url":"https://58nbcawrvb.execute-api.us-west-2.amazonaws.com/test/"
      }

# The auth_json is template for authorizing with AWS Cognito for a token that can be used for calls to the data service.
# For now this is just an empty data structure. You will be prompted for your username and password in a few steps.
auth_json = '''{
     "AuthParameters" : {
        "USERNAME" : "",
        "PASSWORD" : ""
     },
     "AuthFlow" : "USER_PASSWORD_AUTH",
     "ClientId" : ""
  }'''

### Authentication Code

The below method is a helper function for getting an access token for accessing Unity SDS services. You must pass the token along with any API requests in order to access the various Unity SDS services.

In [None]:
# This method is used for taking a username and password and client ID and fetching a cognito token
def get_token(username, password, clientID):
    aj = json.loads(auth_json)
    aj['AuthParameters']['USERNAME'] = username
    aj['AuthParameters']['PASSWORD'] = password
    aj['ClientId'] =clientID 
    token = None
    try:
        response = requests.post('https://cognito-idp.us-west-2.amazonaws.com', headers={"Content-Type":"application/x-amz-json-1.1", "X-Amz-Target":"AWSCognitoIdentityProviderService.InitiateAuth"}, json=aj)
        token = response.json()['AuthenticationResult']['AccessToken']
    except:
        print("Error, check username and password and try again.")
    return token

### Prompt for your Unity username and password

These are required to get the token (described above) to connect to the data services.

In [None]:
print("Please enter your username...")
user_name = input()

print("Please enter your password...")
password = getpass.getpass()

In [None]:
token = get_token(user_name, password, env['clientId'])

if(token):
    print("Token received.")

## List Available Data Collections in the Unity System

Data is organized into Collections. Any particular data file will be in at least one Collection.

In [None]:
# The DAPA-request endpoint to retrieve collections is the base URL plus the following:
url = env['url'] + "am-uds-dapa/collections"

# Make a GET request at the URL you have constructed, using your access token
response = requests.get(url, headers={"Authorization": "Bearer " + token})

print ("Data Collections at " + url)
# To see raw JSON of the API response, uncomment this line:
#print(json.dumps(response.json()))

features = response.json()['features']

for data_set in features:
   print(data_set['id'])

print("\nFull JSON response object:")
JSON(response.json())

## Given a collection (above), List the files within that collection

Executing this cell will retrieve all the files in a Collection defined by the data_set variable. Then it will print out the name and href location of each (up to a limit defined in this code block).

To see a different data Collection, change the data_set variable to one of the other Collections you found in the step above. If you would like to limit your query to something other than 100 files, change the value in the params.append() call.

In [None]:
data_set = "urn:nasa:unity:uds_local_test:TEST1:SNDR_SNPP_ATMS_L1B_OUTPUT___1"
url = env['url'] + "am-uds-dapa/collections/"+data_set+"/items"

params = []
#params.append(("limit", 20))

response = requests.get(url, headers={"Authorization": "Bearer " + token}, params=params)

print(f"Endpoint: "+url)
print(f"Total number of files: {response.json()['numberMatched']}")
print("File IDs, titles, and hrefs in Collection " + data_set + "\n")

features = response.json()['features']

for data_file in features: {
   print("For "+ data_file['id']),
   print("File:\t\t"+data_file['assets']['data']['href']),
   print("Metadata:\t"+data_file['assets']['metadata__data']['href']),
   print("")
}


print("Full JSON response object:")
JSON(response.json())


## Filter the results above by time

The standards-based API used by the Unity SDS Data Store, DAPA, has a variety of filtering options. Currently we have implemented a time-based filter. See more about the Data Access and Processing API at: https://docs.ogc.org/per/20-025r1.html#_dapa_overview

This cell will filter the full list of files in the Collection with ID = data_set by a start and end time defined by the datetime parameter.

In [None]:
data_set = "urn:nasa:unity:uds_local_test:TEST1:SNDR_SNPP_ATMS_L1B_OUTPUT___1"
url = env['url'] + "am-uds-dapa/collections/"+data_set+"/items"
# the datetime,limit, and offset are included due to a current bug in the API Gatway setting these values to 'none'.
# Example date/time params

params = []
#add a datetime to your request
params.append(("datetime", "2000-11-01T00:00:00Z/2022-11-01T02:31:12Z"))

# limit - how many results to return in a single request
#params.append(("limit", 10))

response = requests.get(url, headers={"Authorization": "Bearer " + token}, params=params)

print(f"Total number of files: {response.json()['numberMatched']}")
print("File IDs, datetimes, and hrefs in Collection " + data_set + "\n")

features = response.json()['features']
while len(features) > 0:
    for data_file in features: {
       print(data_file['id']),
       print(data_file['properties']['created']),
       print(data_file['assets']['metadata__data']['href']),
       print(data_file['assets']['data']['href']),
       print("")
    }
    # Get the next page of results
    response = requests.get(next(item for item in response.json()['links'] if item['rel'] == 'next')['href'], headers={"Authorization": "Bearer " + token}, params=params)
    features = response.json()['features']

## Create a new Collection

In [None]:
collection_id = "urn:nasa:unity:uds_local_test:TEST1:NEW_COLLECTION_EXAMPLE_L1B___5"
collection = {
  "type": "Collection",
  "id": collection_id,
  "stac_version": "1.0.0",
  "description": "TODO",
  "links": [
    {
      "rel": "root",
      "href": "./collection.json?bucket=unknown_bucket&regex=%7BcmrMetadata.Granule.Collection.ShortName%7D___%7BcmrMetadata.Granule.Collection.VersionId%7D",
      "type": "application/json",
      "title": "test_file01.nc"
    },
    {
      "rel": "item",
      "href": "./collection.json?bucket=protected&regex=%5Etest_file.%2A%5C.nc%24",
      "type": "data",
      "title": "test_file01.nc"
    },
    {
      "rel": "item",
      "href": "./collection.json?bucket=protected&regex=%5Etest_file.%2A%5C.nc%5C.cas%24",
      "type": "metadata",
      "title": "test_file01.nc.cas"
    },
    {
      "rel": "item",
      "href": "./collection.json?bucket=private&regex=%5Etest_file.%2A%5C.cmr%5C.xml%24",
      "type": "metadata",
      "title": "test_file01.cmr.xml"
    }
  ],
  "stac_extensions": [],
  "extent": {
    "spatial": {
      "bbox": [
        [
          0,
          0,
          0,
          0
        ]
      ]
    },
    "temporal": {
      "interval": [
        [
          "2022-10-04T00:00:00.000000Z",
          "2022-10-04T23:59:59.999999Z"
        ]
      ]
    }
  },
  "license": "proprietary",
  "summaries": {
    "granuleId": [
      "^test_file.*$"
    ],
    "granuleIdExtraction": [
      "(^test_file.*)(\\.nc|\\.nc\\.cas|\\.cmr\\.xml)"
    ],
    "process": [
      "stac"
    ]
  }
}

url = env['url'] + "am-uds-dapa/collections"
response = requests.post(url, headers={"Authorization": "Bearer " + token}, json=collection)
print(response)

## Get newly created Collection

The collection creation may take a minute, so if the new collection isn't returned immediately, please retry.

In [None]:
url = env['url'] + "am-uds-dapa/collections/" + collection_id
response = requests.get(url, headers={"Authorization": "Bearer " + token})
print("Full JSON response object:")
JSON(response.json())

## Explore on your own

Given the endpoints above for finding collections and then finding data within those collections, try to craft a query by copying cells to find data from one of the other collections in the list above.

Some things to try:

* Find data in the Unity system for the L0_SNPP_EphAtt product type
* Find data in the Unity system for the L1 SounderSIPS 
* Filter the collections above on a numer



## Credential-less data download

When accessing data stores within the same venue, you'll be able to download data from S3 without credentials. 

**Note**, the following libraries are needed for this, and the below command can be run in a jupyter-terminal to install them:

```
conda install xarray netcdf4 hdf5 boto3 matplotlib
```


In [None]:
import boto3

In [None]:
s3 = boto3.client('s3')
s3.download_file('uds-test-cumulus-protected', 'SNDR_SNPP_ATMS_L1A___1/SNDR.SNPP.ATMS.L1A.nominal2.04.nc', 'test_file11.nc')

In [None]:
import xarray as xr
ds = xr.open_dataset('test_file11.nc')
ds

In [None]:
ds.band_surf_alt.plot()