# Working with STAC

STAC is the interface to data from a notebook. In this workbook we will show how to read the content of a STAC file and get the locations of data products for input toa. process. Once the process is run, we will show you how to use the unity_py libraries to write a STAC output.

The unity_py library is required for usage. It can be installed by one of the two commands:

1. pypi: `pip install unity_py` -- coming soon
2. From git: main branch: `python -m pip install git+https://github.com/unity-sds/unity-py.git`
3. From git: a specific branch: `python -m pip install git+https://github.com/unity-sds/unity-py.git@develop`
 

In [None]:
from unity_py.unity_exception import UnityException
from unity_py.resources.collection import Collection, Dataset, DataFile

### Reading in STAC

Using the unity_py 'collection' object, read in files from a STAC catalog. The stac catalog should be in an input to the applicaiton you are developing.

In [None]:
collection = Collection.from_stac("/unity/ads/scratch/gangl/chirp/catalog.json")

In [None]:
datasets = collection._datasets
len(datasets)

In [None]:
# A convenince method for only finding assets with specific "keys" (here, data)
data_files = collection.data_locations(["data"])

In [None]:
len(data_files)

In [None]:
data_files[0:20]

In [None]:
# Run my program on this list of files
# This is a simple example of calling a program with the data files as inputs.
# The outputs of this "process" are written to some directory

def chirp_rebin(data_files):
    print("Processing your data files...")


In [None]:
# Call the process
chirp_rebin(data_files)

# Some additional metadata
chirp_rebin_version = "1.0.1"

### Output STAC Creation

The output of `chirp_rebin` is, obstensibly, some set of outputs. The output directory should be configurable and passed in, but for this example let's assume they are written to `/unity/ads/scratch/altinok/tiling_output/v1`.

We must write a stac file so that followon tasks work appropriately within unity. Some follow on tasks would be "cataloging" or "persisting" the data files from the output directory to something persisten (e.g. S3). Without staging these out, they only exist on the instance that _generated_ the products; in a scalable system, these products on different machiens need to be stored in a persistent location.

The unity_py library allows for the creation of STAC files based on the data you've created. Since some metadata are specific to the products being generated, it is the responsiblity of the project to generate "good enough" metadata for use in the unity system.

Below is an example of creating STAC from a set of output files. 

 - Collection: A collection of products. 
 - Dataset: metadata and files that correspond to some space/time measurement. this is the same as a "granule"
 - DataFile: a file representing some part of the dataset. Data, metadata, image, etc are all valid types. The 'type' must be unique when converting to STAC.


### A note about output data

Because data are transferred between systems and disks, "absolute" paths are not ideal in the STAC catalog. the translation from absolute to relative paths is handled automatically by the unity_py catalog with the following stipulations:

* URIs are not made relative; if you create a DataFile with a location like `https:`, `http:`, or `s3:` these will not be converted to relative paths.
* If a relative path is given as the location of a DataFile, it will not be modified
* If a DataFile has an absolute path (e.g. /data/users/mydata...), it will be converted to a relative path IF AND ONLY IF the Collection.to_stac method is called with a path that includes the asset. So if `Collection.to_stac(collection, /data/users/mydata)` is used as the STAC location, the above example datafile will be made relateive. if `Collection.to_stac(collection, /data/users/my_other_data)` is used, it will not be converted to relative paths.

In [None]:
from pathlib import Path
import datetime

collection  = Collection("SNDR13CHRP1AQCal_rebin")

for path in Path('/unity/ads/scratch/altinok/tiling_output/v1').rglob('*.nc'):
        # Create a Dataset for the collection
    dataset_name = path.name
    dataset_start_time = "2023-06-15T01:31:12.467113Z"
    dataset_end_time = "2023-06-15T01:36:12.467113Z"
    dataset_create_time = datetime.datetime.utcnow().replace(tzinfo=datetime.timezone.utc).isoformat()
    dataset = Dataset(dataset_name, collection.collection_id, dataset_start_time, dataset_end_time, dataset_create_time)

    dataset.add_data_file(DataFile("data",str(path.resolve())))
    dataset.add_property("percent_cloud_cover", .01)
    dataset.add_property("pge_version",chirp_rebin_version) 

    #Add the dataset to the collection
    collection.add_dataset(dataset)



In [None]:
len(collection._datasets)

In [None]:
Collection.to_stac(collection, "/unity/ads/scratch/altinok/tiling_output/")

The output of the above command will be a catalog.json file at `/unity/ads/scratch/altinok/tiling_output/catalog.json`. A corresponding item file will be created for each dataset (e.g. `SNDR_tile_2016_s320_S19p25_W055p00_L1_AQ_v1_D_2305241684955813.nc.json`) in the same directory