In [1]:
import pystac
from datetime import datetime
from pystac.extensions.projection import ProjectionExtension

# EarthCODE publishing

At EarthCODE we aim to not just store data but make it easily accessible and [FAIR](https://esa-earthcode.github.io/documentation/Community%20and%20Best%20Practices/FAIR%20and%20Open%20Science%20Best%20Practices/). We implement this by collecting rich metadata. This notebook aims to guide you through this process, and get as much data as possible from you so that we can help you in the best possible way. To process your data we need five things from you:

1. Information about your ESA project
2. Infromation about your data/product
3. Infromation about the actual files/data
4. The data itself, if you want us to host it
5. Infromation about the workflow/code you used to generate the data.


For steps 1. , 2. , 5. we provide the variables you need to define. Step 3. is the most time-consuming, depends a lot on how you plan users to access your data and therefore, we provide you with guides and examples how other ESA projects have done this.

Once, all of this is done you have to.
 1. Open a pull request using the publishing options*.
 2. Send us an email (to earthcode@esa.int) **with your ESA TO in cc**. In the email body:
    - confirm that the ESA TO have signded off on your product/project
    - confirm the license for the data product
 3. (if you want us to store it for you) Send us the data, in that email . 

* you can also send us the metadata by email.

> At any point during this process you can contact us and we can help you!

If you want to see how work data will be presented head over to the Open Science Catalog to see examples - https://opensciencedata.esa.int/

## 1. `project` metadata

The project STAC Collection provides a general description the project - cosortium, time span, related themes, etc. Edit the below variables to specify all the required information.

In [2]:
# Define id, title, description, project status, license
project_id = "" # a custom id of the project, it can be related to the title
project_title = "" # the title of your project
project_description = "" # a description of the project
project_status = "" # project status, pick from - ongoing, completed

# overall license for all related data that will be uploaded from the project., i.e. CC-BYB4.0
# if you have multiple licenses, you can pick 'various'
project_license = '' 

# Define spatial extent of the project study area in epsg:4326
# if you have multiple disjoint study areas, specify the bounding box that covers all of them
# i.e project_s, project_w, project_n, project_e = -180.0, -90.0, 180.0, 90.0 
project_s, project_w, project_n, project_e = -180.0, -90.0, 180.0, 90.0 

# the project start and end times
project_start_year, project_start_month, project_start_day = 2021, 1, 1
project_end_year, project_end_month, project_end_day = 2021,12,31

# Define the links to the project website and  EO4SocietyLink
website_link = ""
eo4socity_link = ""

# Define project themes. Pick one or more from:
# - atmosphere, cryosphere, land, magnetosphere-ionosphere, oceans, solid-earth.
project_themes = [""]

# List the consortium members
consortium_members = ["", ""]

## 2. `product` / dataset metadata

The `product` STAC Collection provides a general description of all project outputs which will be stored on the OSC. Most of these metadata fields should already be available and can be extracted from your data or documentation.

> You can attach one or more products to a project! So if you have more than one, you have to redo steps 2. and 3. for each! 


In [3]:
# Define id, title, description, project status, license
product_id = ""
product_title = ""
product_description = ""
product_status = "completed"

# Define the product license
product_license = 'CC-BYB4.0'

# Define at most five keywords for the product
product_keywords = [ 
    "",
    ""
] 

# Define spatial  in epsg:4326. If the dataset covers discontinuous regions, \
# add the bounding box boundaries for each
# i..e a dataset with global coverage is:product_s product_w, product_n, product_e = [-180.0], [-90.0], [180.0], [90.0]
product_s = []
product_w = []
product_n = []
product_e = []

# Define the temporal extent
product_start_year, product_start_month, product_start_day = 2021, 1, 1
product_end_year, product_end_month, product_end_day = 2021,12,31


# define the semantic region covered by this product, i.e. Belgium
product_region = ""

# Define project themes i.e. land. Pick one or more from:
# - atmosphere, cryosphere, land, magnetosphere-ionosphere, oceans, solid-earth.
product_themes = [""]

# Define the sentinel misisons used in the product. i.e. - "sentinel-2"
# Pick one or more from - https://github.com/ESA-EarthCODE/open-science-catalog-metadata/tree/main/eo-missions


# define output variables and input parameters, i.e. "crop-yield-forecast"
# Pick one or more from from https://github.com/ESA-EarthCODE/open-science-catalog-metadata/tree/main/variables
# If you dont think, your parameters or variables are available, send us a description and name of them and we can add them to the list
product_variables = [   ]
product_parameters = [  ]

# Define doi if available, i.e. "https://doi.org/10.57780/s3d-83ad619" else None
product_doi = None

> We strongly prefer data in [cloud-optimised formats](https://esa-earthcode.github.io/documentation/Community%20and%20Best%20Practices/Data%20and%20Workflow%20Best%20Practices/Data/), since it makes storage and access much easier. If your data is not in one of the specified formats, please contact us before you continue. We can help you transform the data!

## 3.  File / `asset` level metadata

The third step is to describe *ALL* the different files associated with the `product / dataset` you want to upload as STAC Items and Assets,. This is the most time-consuming step. There are multiple strategies for doing this, we are flexible and it is up to you to decide how to do it, so long as the data conforms to standard STAC specification. The main consideration should be usability of the data.

You can find examples of how other ESA projects are doing this on the tutorials page - https://esa-earthcode.github.io/tutorials/index-1/ . There are multiple examples on the web as well. 

Below we offer a few options for you to try. The code does not generalise fully, so we only offer a few libraries and pointers to get you started. 
You have to tailor the code to your data, but generally its easier to extract metadata from the files you have, rather than manually creating it.
Also you do not have to use python libraries - any programming language will do - we are interested in generated metadata only. 

> We can support you throught this process, just contact us!

- Example from https://esa-earthcode.github.io/tutorials/prr-stac-introduction/
This approach works with netcdf, geotiff and zarr files. If you have more than 1 file, you have to extract the metadata for each. We have multiple examples how other projects have done this here - https://esa-earthcode.github.io/tutorials/index-1/

In [6]:

from xstac import xarray_to_stac
import json
import shapely
import xarray as xr
import numpy as np


In [None]:
# define the path to the file
filepath = 'https://data.aviso.altimetry.fr/aviso-gateway/data/indicators/OHC_EEI/4DAtlantic_OHC/OHC_4DATLANTIC_200204_202309_V3-0.nc'

ds = xr.open_dataset(filepath + '#mode=bytes')

# sometimes attributes are not json serialisable, so we convert them to JSON serialisable formats
def convert_to_json_serialisable(attrs):
    attrs = attrs.copy()
    for attr in attrs.keys():
        if isinstance(attrs[attr], np.ndarray):
            attrs[attr] = attrs[attr].tolist()
    return attrs

for var in ds.data_vars:
    ds[var].attrs = convert_to_json_serialisable(ds[var].attrs)

# Describe the first file following the datacube stac extension standards.
# All data is extracted from the metadata / data already present in the file we only specify
# the template and what information is extracted

bbox = [ds['longitude'].values.min(), ds['latitude'].values.min(), ds['longitude'].values.max(), ds['latitude'].values.max(), ]
geometry = json.loads(json.dumps(shapely.box(*bbox).__geo_interface__))

template = {
    "id": f"{product_id}-{'OHC_4DATLANTIC'.lower()}",
    "type": "Feature",
    "stac_version": "1.1.0",
    "description": ds.attrs['summary'],
    "title": 'OHC 4D Atlantic',
    # in the properties you have to map the information you want to extract from the source .nc file
    "properties": {
            "history": ds.attrs['history'],
            "source": ds.attrs['source'],
            "comment": ds.attrs['comment'],
            "references": ds.attrs['references'],
            "version": ds.attrs['version'],
            "conventions": ds.attrs['Conventions'],
            "contact": ds.attrs['contact'],
            "start_datetime": ds.attrs['start_date'] + 'T00:00:00Z',
            "end_datetime": ds.attrs['end_date'] + 'T00:00:00Z',
    },
    "geometry": geometry,
    "bbox": bbox,
    "assets": {
        "data": {
            "href": f"./{product_id}/OHC_4DATLANTIC_200204_202212_V3-0.nc",  # or local path
            "type": "application/x-netcdf",
            "roles": ["data"],
            "title": 'OHC 4D Atlantic'
        }
    }
}

# 3. Generate the STAC Item
item = xarray_to_stac(
    ds,
    template,
    temporal_dimension="time" if 'time' in ds.coords else False,
    x_dimension='longitude',
    y_dimension='latitude',
    reference_system=False
)

In [9]:
item

- Example from https://esa-earthcode.github.io/tutorials/creating-stac-catalog-from-prr-example/ .
This example shows how to create STAC Items and Asset level data, in a more manual way. Its applicable to all file types.


In [17]:
import pystac
from datetime import datetime
import os
from datetime import datetime, timezone


# Basic metadata
doc_href = "/d/S3_AMPLI_User_Handbook.pdf"  # Relative or absolute href
doc_title = "Sentinel-3 Altimetry over Land Ice: AMPLI level-2 Products"
doc_description = "User Handbook for Sentinel-3 Altimetry over Land Ice: AMPLI level-2 Products"
# Convert to ISO format string (YYYY-MM-DD)
dt_utc = datetime.strptime("07/05/2025", "%d/%m/%Y")

# Create STAC item
item = pystac.Item(
    id="sentinel-3-ampli-user-handbook",
    geometry=None,
    bbox=None,
    datetime=dt_utc,
    properties={
        "title": doc_title,
        "description": doc_description,
        "reference": "CLS-ENV-MU-24-0389",
        "issue_n": dt_utc
    }
)

# Add asset for the PDF
item.add_asset(
    key="documentation",
    asset=pystac.Asset(
        href=doc_href,
        media_type="application/pdf",
        roles=["documentation"],
        title=doc_title
    )
)

In [18]:
item

## 4. Access to your data files.

**If you want us to store your data, you have to give us access to it as well**. We strongly prefer data in [cloud-optimised formats](https://esa-earthcode.github.io/documentation/Community%20and%20Best%20Practices/Data%20and%20Workflow%20Best%20Practices/Data/), since it makes storage and access much easier. In addition please tell us:
- the total size of the data
- the data format
- whether you plan on updating it, and with what frequency


If you dont want want us to host the data, but want to host it on a public long-term storage platform such as zenodo, or pangea, you do not have to give us access.

## 5. `workflow` / code metadata
We also strongly encourage projects to also add information about the workflow, code they used to create the product/dataset.

In [None]:
# Define id, title, description, keywords, license
workflow_id = ""
workflow_title=""
workflow_description=""
workflow_keywords= ["", ""]
workflow_license = 'CC-BYB4.0' 

# what data the workflow takes as input and output, i.e. GeoTIFF, Netcdf
workflow_formats = []

# Define which project the workflow is associated with
workflow_project = ""
workflow_project_title = ""


# Define themes i.e. land. Pick one or more from:
# - atmosphere, cryosphere, land, magnetosphere-ionosphere, oceans, solid-earth.
workflow_themes = []


# contacts
workflow_contracts_info = [
]

# Define code url, i.e. https://github.com/ESA-EarthCODE/open-science-catalog-metadata
codeurl = ''

# Publishing

Once, all of this is done you have to.
 1. Open a pull request using the publishing options*.
 2. Send us an email (to earthcode@esa.int) **with your ESA TO in cc**. In the email body:
    - confirm that the ESA TO have signded off on your product/project
    - confirm the license for the data product and/or workflow
 3. (if you want us to store it for you) Send us the data, in that email . 

Alternatively, you can also send us the metadata, and notebook/link to the notebook if you have questions.
