# Working with Data

The intent of this tutorial is to help familiarize yourself with browsing for data that will be used along with an application to generate data by submitting a job. Job submission will be covered in the next tutorial. Run each cell in order (shift-enter). The notes will indicate when you need to edit code to customize things (e.g., to indicate a data collection)vs. being prompted by running the cell (e.g. for your username and password).

In [1]:
import requests
import getpass
import json
from IPython.display import JSON

from unity_sds_client.unity import Unity
from unity_sds_client.unity import UnityEnvironments
from unity_sds_client.unity_session import UnitySession
from unity_sds_client.unity_services import UnityServices as services
from unity_sds_client.resources.collection import Collection

In [2]:
# We will set the environment to 'DEV' here but this should be set to test or prod eventually.
s = Unity(UnityEnvironments.DEV)
# set the venue for interacting with venue specific services
# if your venue id is a single string, use the following

Please enter your Unity username:  gangl
Please enter your Unity password:  ········


## List Available Data Collections in the Unity System

Data is organized into Collections. Any particular data file will be in at least one Collection.

In [3]:
dataManager = s.client(services.DATA_SERVICE)
collections = dataManager.get_collections()
for c in collections:
    print(c.collection_id)


urn:nasa:unity:unity:dev:gangl___2
urn:nasa:unity:unity:dev:gangl___1
URN:NASA:UNITY:UDS_LOCAL_TEST:DEV:UDS_COLLECTION___2402011700
urn:nasa:unity:unity:dev:SBG-L2A_CORFL___1
urn:nasa:unity:unity:dev:SBG-L2A_RSRFL___1
urn:nasa:unity:unity:dev:SBG-L2A_RFL___1
urn:nasa:unity:unity:dev:SBG-L1B_PRE___1
urn:nasa:unity:unity:dev:SBG-L2B_VEGBIOCHEM___1
urn:nasa:unity:unity:dev:SBG-L2B_FRCOV___1
urn:nasa:unity:uds_local_test:DEV1:NEW_COLLECTION_EXAMPLE_L1B___NGA10


## Given a collection (above), List the files within that collection

Executing this cell will retrieve all the files in a Collection defined by the data_set variable. Then it will print out the name and href location of each (up to a limit defined in this code block).

To see a different data Collection, change the data_set variable to one of the other Collections you found in the step above. If you would like to limit your query to something other than 100 files, change the value in the params.append() call.

In [5]:
collection_id = "urn:nasa:unity:unity:dev:SBG-L1B_PRE___1"
cd = dataManager.get_collection_data(Collection(collection_id))
for dataset in cd:
    print(f'dataset name: {dataset.data_begin_time}')
    print(f'dataset name: {dataset.id}' )
    for f in dataset.datafiles:
        print(f)
        #print("	" + f.location + ", roles: " + str(f.roles) + ", type: " + f.type + ", description: " + f.description + ", title: " + f.title)

dataset name: 2024-01-03T13:19:36Z
dataset name: urn:nasa:unity:unity:dev:SBG-L1B_PRE___1:SISTER_EMIT_L1B_RDN_20240103T131936_001
unity_sds_client.resources.DataFile(location=s3://sps-dev-ds-storage/urn:nasa:unity:unity:dev:SBG-L1B_PRE___1/urn:nasa:unity:unity:dev:SBG-L1B_PRE___1:SISTER_EMIT_L1B_RDN_20240103T131936_001/SISTER_EMIT_L1B_RDN_20240103T131936_001.json)
unity_sds_client.resources.DataFile(location=s3://sps-dev-ds-storage/urn:nasa:unity:unity:dev:SBG-L1B_PRE___1/urn:nasa:unity:unity:dev:SBG-L1B_PRE___1:SISTER_EMIT_L1B_RDN_20240103T131936_001/SISTER_EMIT_L1B_RDN_20240103T131936_001_OBS.met.json)
unity_sds_client.resources.DataFile(location=s3://sps-dev-ds-storage/urn:nasa:unity:unity:dev:SBG-L1B_PRE___1/urn:nasa:unity:unity:dev:SBG-L1B_PRE___1:SISTER_EMIT_L1B_RDN_20240103T131936_001/SISTER_EMIT_L1B_RDN_20240103T131936_001.met.json)
unity_sds_client.resources.DataFile(location=s3://sps-dev-ds-storage/urn:nasa:unity:unity:dev:SBG-L1B_PRE___1/urn:nasa:unity:unity:dev:SBG-L1B_PRE_

## Get a Token!

For some operations, its helpful to get the token that allows you to communicate with the unity services. This token can be used in curl commands or other commands outside of the unity-py ecosystem.

In [6]:
token = s._session.get_auth().get_token()

## Create a Collection

In [7]:
# To create a collection, we are required to set the project and venue to which the collection will belong.
s.set_project("unity")
s.set_venue("dev")
dataManager = s.client(services.DATA_SERVICE)

# All collection ids follow the pattern: urn:nasa:unity:{project}:{venue}:{collection_name}____{version}.
collection_id = "urn:nasa:unity:unity:dev:gangl___2"
dataManager.create_collection(Collection(collection_id))

## View recently created collection

This is an asynchronous operation, so there may be a delay in the request for a collection creation and when it shows up in the response.


In [8]:
dataManager = s.client(services.DATA_SERVICE)
collections = dataManager.get_collections()
for c in collections:
    if c.collection_id ==  collection_id:
        print(c.collection_id)

urn:nasa:unity:unity:dev:gangl___2


## Add data files to a collection - Coming soon

Data files are added via STAC catalogs. Below we will upload several files, create a stac entry for them, and then request they be _cataloged_ in the system. Within unity, the creation/storage of a file and the catalogging of that file are spearate events. This may change in the future, but this offers some flexibility for transient files currently.

## Credential-less data download -- Coming Soon

When accessing data stores within the **same venue**, you'll be able to download data from S3 without credentials. 

**Note**, the following libraries are needed for this, and the below command can be run in a jupyter-terminal to install them:

```
conda install xarray netcdf4 hdf5 boto3 matplotlib
```


In [None]:
import sys
!{sys.executable} -m pip install boto3
import boto3

In [None]:
s3 = boto3.client('s3')
#s3://ssips-test-ds-storage-reproc/urn:nasa:unity:ssips:TEST1:CHRP_16_DAY_REBIN___1/urn:nasa:unity:ssips:TEST1:CHRP_16_DAY_REBIN___1:SNDR_tile_2016_s320_N16p50_E120p00_L1_AQ_v1_D_2311021698943223.nc/SNDR_tile_2016_s320_N16p50_E120p00_L1_AQ_v1_D_2311021698943223.nc
s3.download_file('ssips-test-ds-storage-reproc', 'urn:nasa:unity:unity:dev:SBG-L2A_RFL___1/urn:nasa:unity:unity:dev:SBG-L2A_RFL___1:SISTER_EMIT_L2A_RFL_20240103T131936_001/SISTER_EMIT_L2A_RFL_20240103T131936_001.png', "file.png")

The file now should appear in the directory tree to the left in jupyter

In [None]:
!{sys.executable} -m pip install xarray netCDF4 matplotlib
import xarray as xr
import netCDF4

ds = xr.open_dataset('SNDR_tile_2016_s320_S38p50_E010p00_L1_AQ_v1_D_2312131702486391.nc')
ds

In [None]:
ds["sat_zen"].plot()

In [None]:
ds["rad"].plot()