# Working with STAC

STAC is the interface to data from a notebook. In this workbook we will show how to read the content of a STAC file and get the locations of data products for input toa. process. Once the process is run, we will show you how to use the unity_py libraries to write a STAC output.

The unity_py library is required for usage. It can be installed by one of the two commands:

1. pypi: `pip install unity-sds-client`
2. From git: main branch: `python -m pip install git+https://github.com/unity-sds/unity-py.git`
3. From git: a specific branch: `python -m pip install git+https://github.com/unity-sds/unity-py.git@develop`
 

In [1]:
from unity_sds_client.unity_exception import UnityException
from unity_sds_client.resources.collection import Collection, Dataset, DataFile

### Reading in STAC

Using the unity_py 'collection' object, read in files from a STAC catalog. The stac catalog should be in an input to the applicaiton you are developing.

In [2]:
collection = Collection.from_stac("data/SBG-L1B-PRE/catalog.json")

In [3]:
datasets = collection._datasets
len(datasets)

1

In [4]:
# A convenince method for only finding assets with specific "keys" (here, data)
data_files = collection.data_locations()

In [5]:
len(data_files)

10

In [6]:
data_files[0:20]

['/home/jovyan/sounder-sips-tutorial/jupyter-notebooks/tutorials/data/SBG-L1B-PRE/./SISTER_EMIT_L1B_RDN_20231206T160939_001.met.json',
 '/home/jovyan/sounder-sips-tutorial/jupyter-notebooks/tutorials/data/SBG-L1B-PRE/./SISTER_EMIT_L1B_RDN_20231206T160939_001.hdr',
 '/home/jovyan/sounder-sips-tutorial/jupyter-notebooks/tutorials/data/SBG-L1B-PRE/./SISTER_EMIT_L1B_RDN_20231206T160939_001_LOC.bin',
 '/home/jovyan/sounder-sips-tutorial/jupyter-notebooks/tutorials/data/SBG-L1B-PRE/./SISTER_EMIT_L1B_RDN_20231206T160939_001_OBS.bin',
 '/home/jovyan/sounder-sips-tutorial/jupyter-notebooks/tutorials/data/SBG-L1B-PRE/./SISTER_EMIT_L1B_RDN_20231206T160939_001_LOC.met.json',
 '/home/jovyan/sounder-sips-tutorial/jupyter-notebooks/tutorials/data/SBG-L1B-PRE/./SISTER_EMIT_L1B_RDN_20231206T160939_001_LOC.hdr',
 '/home/jovyan/sounder-sips-tutorial/jupyter-notebooks/tutorials/data/SBG-L1B-PRE/./SISTER_EMIT_L1B_RDN_20231206T160939_001.png',
 '/home/jovyan/sounder-sips-tutorial/jupyter-notebooks/tutorials

In [7]:
for dataset in collection.datasets:
    print(f'dataset name: {dataset.data_begin_time}')
    print(f'dataset name: {dataset.id}' )
    for f in dataset.datafiles:
        #print(f)
        print("	" + f.location + ", roles: " + str(f.roles) + ", type: " + f.type + ", description: " + f.description + ", title: " + f.title)

dataset name: 2023-12-11T22:22:08.874292+00:00
dataset name: SISTER_EMIT_L1B_RDN_20231206T160939_001
	/home/jovyan/sounder-sips-tutorial/jupyter-notebooks/tutorials/data/SBG-L1B-PRE/./SISTER_EMIT_L1B_RDN_20231206T160939_001.met.json, roles: [], type: , description: , title: SISTER_EMIT_L1B_RDN_20231206T160939_001.met.json file
	/home/jovyan/sounder-sips-tutorial/jupyter-notebooks/tutorials/data/SBG-L1B-PRE/./SISTER_EMIT_L1B_RDN_20231206T160939_001.hdr, roles: [], type: , description: , title: SISTER_EMIT_L1B_RDN_20231206T160939_001.hdr file
	/home/jovyan/sounder-sips-tutorial/jupyter-notebooks/tutorials/data/SBG-L1B-PRE/./SISTER_EMIT_L1B_RDN_20231206T160939_001_LOC.bin, roles: [], type: , description: , title: SISTER_EMIT_L1B_RDN_20231206T160939_001_LOC.bin file
	/home/jovyan/sounder-sips-tutorial/jupyter-notebooks/tutorials/data/SBG-L1B-PRE/./SISTER_EMIT_L1B_RDN_20231206T160939_001_OBS.bin, roles: [], type: , description: , title: SISTER_EMIT_L1B_RDN_20231206T160939_001_OBS.bin file
	

In [8]:
# Run my program on this list of files
# This is a simple example of calling a program with the data files as inputs.
# The outputs of this "process" are written to some directory

def my_process(data_files):
    print("Processing your data files...")


In [9]:
# Call the process
my_process(data_files)

# Some additional metadata
my_process_version = "1.0.1"

Processing your data files...


### Output STAC Creation

The output of `my_process` is, obstensibly, some set of outputs. The output directory should be configurable and passed in, but for this example let's assume they are written to `/unity/ads/outputs/SBG-L2A-RESAMPLE`.

We must write a stac file so that followon tasks work appropriately within unity. Some follow on tasks would be "cataloging" or "persisting" the data files from the output directory to something persisten (e.g. S3). Without staging these out, they only exist on the instance that _generated_ the products; in a scalable system, these products on different machiens need to be stored in a persistent location.

The unity_py library allows for the creation of STAC files based on the data you've created. Since some metadata are specific to the products being generated, it is the responsiblity of the project to generate "good enough" metadata for use in the unity system.

Below is an example of creating STAC from a set of output files. 

 - Collection: A collection of products. 
 - Dataset: metadata and files that correspond to some space/time measurement. this is the same as a "granule"
 - DataFile: a file representing some part of the dataset. Data, metadata, image, etc are all valid types. The 'type' must be unique when converting to STAC.


### A note about output data

Because data are transferred between systems and disks, "absolute" paths are not ideal in the STAC catalog. the translation from absolute to relative paths is handled automatically by the unity_py catalog with the following stipulations:

* URIs are not made relative; if you create a DataFile with a location like `https:`, `http:`, or `s3:` these will not be converted to relative paths.
* If a relative path is given as the location of a DataFile, it will not be modified
* If a DataFile has an absolute path (e.g. /data/users/mydata...), it will be converted to a relative path IF AND ONLY IF the Collection.to_stac method is called with a path that includes the asset. So if `Collection.to_stac(collection, /data/users/mydata)` is used as the STAC location, the above example datafile will be made relateive. if `Collection.to_stac(collection, /data/users/my_other_data)` is used, it will not be converted to relative paths.

In [14]:
from pathlib import Path
import datetime
import os

output_directory = str(Path("data/SBG-L2A-RESAMPLE").resolve())
collection  = Collection("my_process_output_collection")

# Create a Dataset for the collection
dataset_name = "SISTER_EMIT_L2A_RSRFL_20231206T160939_000"
dataset_start_time = "2023-06-15T01:31:12.467113Z"
dataset_end_time = "2023-06-15T01:36:12.467113Z"
dataset_create_time = datetime.datetime.utcnow().replace(tzinfo=datetime.timezone.utc).isoformat()
dataset = Dataset(dataset_name, collection.collection_id, dataset_start_time, dataset_end_time, dataset_create_time)
dataset.add_property("tag", "reprocessing")

for path in Path('data/SBG-L2A-RESAMPLE').rglob('*'):
    #type, location, roles = [], title = "", description = "" 
    if str(path.resolve()).endswith(".bin"):
        dataset.add_data_file(DataFile("binary",str(path.resolve()), ["data"]))
    elif str(path.resolve()).endswith(".png"):
        dataset.add_data_file(DataFile("image/png",str(path.resolve()), ["browse"]))
    else:
        dataset.add_data_file(DataFile(None,str(path.resolve()), ["metadata"]))
        
#Add the STAC file we are creating
dataset.add_data_file(DataFile("text/json",os.path.join(output_directory, dataset_name + ".json"), ["metadata"]))
print(os.path.join(output_directory, dataset_name + ".json"))
    #Add the dataset to the collection
collection.add_dataset(dataset)



/home/jovyan/sounder-sips-tutorial/jupyter-notebooks/tutorials/data/SBG-L2A-RESAMPLE/SISTER_EMIT_L2A_RSRFL_20231206T160939_000.json


In [15]:
len(collection._datasets)

1

In [16]:
Collection.to_stac(collection, output_directory)

The output of the above command will be a catalog.json file at `/unity/ads/scratch/altinok/tiling_output/catalog.json`. A corresponding item file will be created for each dataset (e.g. `SNDR_tile_2016_s320_S19p25_W055p00_L1_AQ_v1_D_2305241684955813.nc.json`) in the same directory

In [17]:
new_collection = Collection.from_stac(output_directory+"/catalog.json")

In [18]:
for dataset in new_collection.datasets:
    print(f'dataset name: {dataset.data_begin_time}')
    print(f'dataset name: {dataset.id}' )
    for f in dataset.datafiles:
        #print(f)
        print("	" + f.location + ", roles: " + str(f.roles) + ", type: " + f.type + ", description: " + f.description + ", title: " + f.title)

dataset name: 2023-06-15T01:31:12.467113Z
dataset name: SISTER_EMIT_L2A_RSRFL_20231206T160939_000
	/home/jovyan/sounder-sips-tutorial/jupyter-notebooks/tutorials/data/SBG-L2A-RESAMPLE/./SISTER_EMIT_L2A_RSRFL_20231206T160939_000.hdr, roles: ['metadata'], type: , description: , title: None file
	/home/jovyan/sounder-sips-tutorial/jupyter-notebooks/tutorials/data/SBG-L2A-RESAMPLE/./SISTER_EMIT_L2A_RSRFL_20231206T160939_000_UNC.bin, roles: ['data'], type: , description: , title: binary file
	/home/jovyan/sounder-sips-tutorial/jupyter-notebooks/tutorials/data/SBG-L2A-RESAMPLE/./SISTER_EMIT_L2A_RSRFL_20231206T160939_000_UNC.hdr, roles: ['metadata'], type: , description: , title: None file
	/home/jovyan/sounder-sips-tutorial/jupyter-notebooks/tutorials/data/SBG-L2A-RESAMPLE/./SISTER_EMIT_L2A_RSRFL_20231206T160939_000.bin, roles: ['data'], type: , description: , title: binary file
	/home/jovyan/sounder-sips-tutorial/jupyter-notebooks/tutorials/data/SBG-L2A-RESAMPLE/./SISTER_EMIT_L2A_RSRFL_2023