# EarthCODE publishing guide

At EarthCODE we aim to not just store data but make it easily accessible and [FAIR](https://esa-earthcode.github.io/documentation/Community%20and%20Best%20Practices/FAIR%20and%20Open%20Science%20Best%20Practices/). We implement this by collecting rich metadata. This notebook aims to guide you through this process, and get as much data as possible from you so that we can help you in the best possible way. 
To process your data and be able to publish it on Open Science Data Catalogue, we need five things from you to get started:

1. Information about your ESA (EO Programme) funded Project, 
2. Infromation about your Product / Dataset
3. Infromation about the actual files/data
4. Information if you need to store your results permanently by ESA.
5. Information about the workflow/code you used to generate the data.


For steps 1. , 2. , 5. we provide the required parameters and fields in the code cells of this notebook, whcich you need to update with information coming from your project/product/. Step 3. is the most time-consuming, depends a lot on how you plan users to access your data and therefore, we provide you with guides and examples how other ESA projects have done this.

Once, you provide the inputs, save the notebook locally, and save locally generated metadata files. Then: <br>

*  1. 

      Create a copy of the Open Science Catalog - locally and create a Pull Request to Publish your data (See the steps at the end of this notebook)
*  **OR**

      Send us the generated metadata from steps 1., 2. and 5. You can use our self-publishing options (see for the examples/ folder here or [tutorials](https://esa-earthcode.github.io/tutorials/index-2/)) or attach it in the email in step 2.


 Then,
 2. Send us an email (to mailto:earth-code@esa.int) **with your ESA TO in cc**. In the email body:
    - **confirm that the ESA TO have signded off on your product/project**
    - confirm the **license for the data product**
 3. Provide us access to the data/files themselves if you would like to store them at ESA provided repository `(see section 4.)`  

> At any point during this process you can [contact us](mailto:earth-code@esa.int) and we can help you!

If you want to see how work data will be presented - head over to the Open Science Catalog to see entire database: [https://opensciencedata.esa.int/](https://opensciencedata.esa.int/)


You can use this notebook to edit the code to make it fit your project/data/workflow.

In [None]:
import pystac
from datetime import datetime

from earthcode.static import (
    create_product_collection, 
    create_project_collection, 
    create_workflow_collection
)
from earthcode.static import manually_add_product_links

# Section 1: How to publish new data into the catalog? <br>
The Open Science Catalog is built on the [Spatio Temporal Asset Catalog (STAC)](https://stacspec.org/en), which is a standardised format for describing geospatial data. Therefore new entries must conform to its specification.<br> Cells below allow you to prepare your metadata that conforms to these standards. 

## 1. `project` metadata

The project STAC Collection provides a general description of your ESA-funded project - including its official title, short descrption, time span, consortium members involved, related themes, etc. <br> Edit parameters below to specify all the required information. See helper description in the comments inside that code cell. 

> See **example project metadata** directly at open science catalogue metadata repository on GitHub to compare the list of required parameters and their format: [See example project: WAPOSAL](https://github.com/ESA-EarthCODE/open-science-catalog-metadata/blob/main/projects/waposal/collection.json)


**LICENSE:** In this step you are required to select one of the available licenses for **all your products** generated by the project. <br> Please have a look at available list of license and pick the one that defines your datasets: [osc-licence schemas](https://github.com/ESA-EarthCODE/open-science-catalog-validation/blob/main/schemas/license.json). <br>

> *If your projects generates and distributes datasets with multiple licenses, please select 'various' and define specific license for each of your product separately in the further steps.*


In [2]:
# BASIC INFORMATION ABOUT THE PROJECT: define id, title, description, status, and license for your project
# A custom id of the project, it can be related to the title, i.e. - 4datlantic-ohc. Use dash "-" symbol to separare words in the id"
project_id = "" 
# Specify the Title of your project. I.e. - 4DAtlantic-OHC. This should correspond to the title of the project as in the ESA contract. 
project_title = "" 
# A short description of the project:
project_description = ""
# Project status: pick from - ongoing or completed
project_status = ""

# Overall license for all related data that will be uploaded from the project., i.e. CC-BY-4.0. See the note in the markdown cell above to consult full list of available licenses. 
# If you have multiple licenses, you can pick 'various'
project_license = '' 

# Define spatial extent of the project study area in epsg:4326
# if you have multiple disjoint study areas, specify the bounding box that covers all of them
# i.e project_s, project_w, project_n, project_e = -180.0, -90.0, 180.0, 90.0 
project_s, project_w, project_n, project_e = -180.0, -90.0, 180.0, 90.0 

# The project start and end times
project_start_year, project_start_month, project_start_day = 2021, 1, 1
project_end_year, project_end_month, project_end_day = 2021,12,31

# Define the links to the project website and dedicated Project website on EO4SocietyLink. Discover the list of published projects here: https://eo4society.esa.int/projects/ 
website_link = ""
eo4socity_link = ""

# Define project themes, according to OSC ontology. Pick one or more from:
# - atmosphere, cryosphere, land, magnetosphere-ionosphere, oceans, solid-earth. 
# See the list here: https://github.com/ESA-EarthCODE/open-science-catalog-metadata/blob/main/themes/catalog.json
project_themes = [""]

# provide the Name and e-mail address to ESA Technical Officer (TO) supporting your project:
to_name = ''    # Full Name and Surname
to_email = ''   

# List the consortium members in a tuple with format (name, contact_email), for example - ('University A', "contact@universitya.fr")
consortium_members = [('', '')]

In [None]:
# combine the spatial and temporal extent
spatial_extent = pystac.SpatialExtent([[project_s, project_w, project_n, project_e]])
temporal_extent = pystac.TemporalExtent(
    [[datetime(project_start_year, project_start_month, project_start_day), 
      datetime(project_end_year, project_end_month, project_end_day)]])
project_extent = pystac.Extent(spatial=spatial_extent, temporal=temporal_extent)

# generate project collection
project_collection = create_project_collection(
    project_id, 
    project_title,
    project_description, 
    project_status,
    project_license,
    project_extent,
    project_themes,
    to_name,
    to_email,
    consortium_members,
    website_link,
    eo4socity_link=eo4socity_link
)

# validate the collection
project_collection.validate()

# Preview your project collection metadata: 
project_collection
# save the generated collection to a file(OPTIONAL)
#project_collection.save_object(dest_href='./project_collection.json')

## 2. `product` / dataset metadata

The `product` STAC Collection provides a general metadata description of all project outputs which will be discovered on the Open Science Catalogue (OSC). Most of these metadata fields should already be available and can be extracted from your data or documentation.

> You can **attach one or more products to a single project**! So if you have more than one, you have to redo steps 2. and 3. for each!

**LICENSE:** In this step you are required to select one of the available licenses for **each of your product**. <br> Please have a look at available list of license and pick the one that defines your datasets: [osc-licence schemas](https://github.com/ESA-EarthCODE/open-science-catalog-validation/blob/main/schemas/license.json). <br>
>*If you have a product with non-defined license, we **cannot proceed with publishing the datasets**. Please specify 'various' license if you have doubts and need more time to assign correct license but want to publish the product!*<br>

**PRODUCT EO-MISSIONS:** In this step you are required to select one or more EO Missions that you have used to generate your product.
Please have a look at the **defined list of EO missions** available in the OSC under: (https://github.com/ESA-EarthCODE/open-science-catalog-metadata/tree/main/eo-missions) and searchable under: https://opensciencedata.esa.int/eo-missions/catalog <br> 
>*If you have a product which uses or complements in-situ data collections or comes as a results of numerical models please select: ["in-situ-observations"] or ["numerical-models"]*<br>

**PRODUCT VARIABLES:** In this step you are required to select one or more variables that your product describes.
Please have a look at the **defined list of geophysical variables** available in the OSC under: (https://github.com/ESA-EarthCODE/open-science-catalog-metadata/tree/main/variables). You can also explore the list of variables under: https://opensciencedata.esa.int/variables/catalog

> Variables are defined in OSC as geophysical, climate and environmental variables selected from [WMO OSCAR Database](https://space.oscar.wmo.int/variables), complemented by the [GCMD Keywords Database](https://gcmd.earthdata.nasa.gov/KeywordViewer/scheme/Earth%20Science?gtm_scheme=Earth%20Science)

**PRODUCT PARAMETER:** 
Please provide a parameter linked to the product, in allignment with the **CF convention** standard: See full list under: [https://cfconventions.org/](https://cfconventions.org/)

**PRODUCT DOI:**
Since few weeks EarthCODE offers DOI assignment to products/datasets published on Open Science Data Catalogue. The process is still manual and is handled by ESA TEllUs service, and is handled by the EarthCODE Data Stewardship team on behalf of the Project PI. 
> If you would like to assign a DOI to your data, please contact the EarthCODE Team, who will support you in this process: [earth-code@esa.int](mailto:earth-code@esa.int)!

In [3]:
# BASIC INFORMATION ABOUT THE PRODUCT
# Define id, title, description, product status, license.
# A custom id of the product (must be different from project!), it can be related to the title, i.e. - 4datlantic-ohc-dataset. Use dash "-" symbol to separare words in the id"
product_id = ""
product_title = ""
product_description = ""
# Product status: pick from - ongoing or completed
product_status = ""

# Define the product license. i.e. CC-BY-4.0. See the note in the markdown cell above to consult full list of available licenses. 
# If you have multiple licenses, you can pick 'various'
product_license = 'CC-BY-4.0'

# Define at most five keywords for the product. You can use any short text, that allow users to discover your product.
product_keywords = [ 
    "",
    ""
] 

# Define spatial extent of PRODUCT/DATASET in epsg:4326. If the dataset covers discontinuous regions, \
# add the bounding box boundaries for each
# i..e 
# - a dataset with global coverage is:product_s product_w, product_n, product_e = [-180.0], [-90.0], [180.0], [90.0]
# - a dataset with multiple disjoint regions is: 
#       product_s product_w, product_n, product_e = [-180.0, -180.0], [-90.0, -90.0], [180.0, 180.0], [90.0, 90.0]
product_s = [-180.0, -180.0]
product_w = [-90.0, -90.0]
product_n = [180.0, 180.0]
product_e = [90.0, 90.0]


# Define the temporal extent of PRODUCT/ DATASET
product_start_year, product_start_month, product_start_day = 2021, 1, 1
product_end_year, product_end_month, product_end_day = 2021,12,31


# Define the semantic region covered by this product, i.e. Belgium, Global etc. 
product_region = ""

# Define product themes i.e. land. Pick one or more from:
# - atmosphere, cryosphere, land, magnetosphere-ionosphere, oceans, solid-earth.
# See the list here: https://github.com/ESA-EarthCODE/open-science-catalog-metadata/blob/main/themes/catalog.json
product_themes = [""]

# Define the eo-misison(s) used to generate the product. i.e. - "sentinel-2"
# Pick one or more from - https://github.com/ESA-EarthCODE/open-science-catalog-metadata/tree/main/eo-missions
# i.e. product_missions = ['in-situ-observations', 'grace', 'numerical-models']
product_missions = []

# Define variables describing at best your Product/ dataset: 
# Pick one or more from from https://github.com/ESA-EarthCODE/open-science-catalog-metadata/tree/main/variables
# Please see the comment in the markdown cell above to select a right variable and provide it in the correct format i.e. i.e. "crop-yield-forecast"
product_variables = []
# Define the parameters describing your product in standardised CF convention format: i.e. "leaf_area_index". 
# Please see the description in the markdown cell above to select a right parameter and provide it in the correct format
product_parameters = []

# Provide DOI number assigned to your product, i.e. "https://doi.org/10.57780/s3d-83ad619". If your product does not have one, type: 'None'
# See description in the markdown cell to request the DOI if needed. 
product_doi = None

# Define the related project id and title
# These must match the new or an already existing project in the catalog! Alteratively correct links cannot be produced!  
# See the previous 'project metatdata' step, and copy and paste these parameters according to your previous inputs. 
project_id = ''
project_title = ''

In [None]:
# combine the spatial and temporal extent
spatial_extent = pystac.SpatialExtent([list(data) for data in zip(product_s, product_w, product_n, product_e)])
temporal_extent = pystac.TemporalExtent(
    [[datetime(product_start_year, product_start_month, product_start_day), 
      datetime(product_end_year, product_end_month, product_end_day)]])
product_extent = pystac.Extent(spatial=spatial_extent, temporal=temporal_extent)

# generate the product collection
product_collection = create_product_collection(product_id, product_title, product_description, 
                              product_extent, product_license,
                              product_keywords, product_status, product_region,
                              product_themes, product_missions, product_variables,
                              project_id, project_title,)


# validate the collection
product_collection.validate()
product_collection

### Add Documentation and access link to your datasets

To contribute to the Open Science Catalog, your research data and workflows/code must be hosted on remote, persistent storage that allows discovery. Examples include:

* Repository provided by ESA
* S3-compatible object storage - permanent and public
* GitHub for workflow/methods and code
* Zenodo, CEDA, Dataverse, or other persistent archives

>**DATA FORMATS**: We strongly reccomend data stored in [cloud-optimised formats](https://esa-earthcode.github.io/documentation/Community%20and%20Best%20Practices/Data%20and%20Workflow%20Best%20Practices/Data/), since it makes storage and access much easier. If your data is not in one of the specified formats, please contact us before you continue. We can help you transform the data!

Please **provide us with the link to your files/datasets**, where they are physically stored, so that our team can test it before publishing. <br> 
This link if permanent will be also discoverable by others by adding this parameter to metadata on Open Science Catalog. Alternatively, our team will reach back to convert your dataset into supported format and upload it to ESA - provided storage. 

> At this stage, please provide us the access link to your data and documentation (e.g. User Handbook, Product Validation Report or other documentation relevant to the dataset, in the cell below.<br>



In [None]:
# Define the relevant links to complete your data description. 
# link to an external data collection (STAC Catalog of your dataset) if available. If not, leave as None
item_link = '' # This link can be also generated at the later stage (see description in Step 3)
# Please provide a valid link to accessing the data. If you have a password-protected repository, please inform our team about it and provide us with access (via email)
access_link = f''   # Please insert here valid a https:// link 
#Link to the documentation.
documentation_link = '' # # Please insert here a link to documentation, alternatively write "None' and send us the documentation via email. 

manually_add_product_links(product_collection, access_link, documentation_link, item_link,)

In [None]:
product_collection

### Save the product and project collection locally

In [None]:
# Save the product and project collection locally (OPTIONAL)
project_collection.save_object(dest_href='./project_collection.json')  # rename your file and set the output path according to your needs
product_collection.save_object(dest_href='./product_collection.json')  # rename your file and set the output path according to your needs

## 3.  How to add File / `asset` level metadata to your Product? 

If your data does not have yet persistent, online and prferably cloud-optimised repository, we recommend uploading your results into **ESA-sponsored repository**, granted by EarthCODE. The repository provides access to data, workflows, experiments and documentation from ESA Projects organised across Collections, accessible via the [STAC API](https://github.com/radiantearth/stac-api-spec). <br>
Each Collection contains [STAC Items](https://github.com/radiantearth/stac-spec/blob/master/item-spec/item-spec.md), with their related Assets stored within the repository. Therefore, to **upload your data to the repository, you have to generate a STAC collection that describes your data, code and documentation**.

This step serves to allow you discover how to describe *ALL* the different files associated with the `product / dataset` you want to upload as STAC Items and Assets. This is the most time-consuming step. There are multiple strategies for doing this, we are flexible and it is up to you to decide how to do it, so long as the data conforms to standard STAC specification.<br> **The main consideration should be usability of the data!**.<br>
> Learn more about STAC specification here: https://stacspec.org/en

### Explore available tutorials on EarthCODE portal
If you are new to STAC Specification and how this applies to your dataset, we have many [tutorials](https://esa-earthcode.github.io/tutorials/prr-stac-introduction/) available from the EarthCODE Portal and executable from designated [workspace](https://workspace.earthcode.eox.at/). 
Tutorials provide explanation on how to generate the STAC Items from most commonly used data formats like: `netcdf, geotiff and zarr files`. If you have more than 1 file, you have to extract the metadata for each. The code does not generalise fully, so we only offer a few libraries and pointers to get you started. 
You have to tailor the code to your data, but generally the list of tutorials should faciliate this task.

> **Further examples can be found under**: https://esa-earthcode.github.io/tutorials/index-1/ <br>
More manual way to create STAC Items and Asset level data, is shown in the following [example](https://esa-earthcode.github.io/tutorials/creating-stac-catalog-from-prr-example/) (applicable to all file types - including documentation)<br>
*Provided example use Python programming language, but you are free to explore options in other programming languages, if your are more comfortable with them. In that case please share with us the STAC Collections generated by your script*.  

> We can support you through this process, just [**contact us**](mailto:earth-code.esa.int) or post in the [FORUM](https://discourse-earthcode.eox.at/c/technical-support/8)!

### Provide us links to your STAC Collections with Items (optional) 
If you have followed the tutorials descirbed above, and generated STAC Items from your datasets, provide us with the links to them, to update your `product_collection`.<br>
> Run this cell **Only if you have your STAC Collection successfully generated** alternatively: [**contact us**](mailto:earth-code.esa.int).<br>

See example below to check how to provide this information. 

In [None]:
from earthcode.static import manually_add_product_links

# Define the relevant data links to be manually added
# link to an external data collection if available. If not, leave as None
item_link = 'https://s3.waw4-1.cloudferro.com/EarthCODE/Catalogs/4datlantic-ohc/collection.json'
# Link to accessing the data, this link is required. Leave as None, if you are adding children in this notebook.
access_link = f'https://opensciencedata.esa.int/stac-browser/#/external/{item_link}'
#Link to the documentation, leave as None, if not available
documentation_link = 'https://www.aviso.altimetry.fr/fileadmin/documents/data/tools/OHC-EEI/OHCATL-DT-035-MAG_EDD_V3.0.pdf'


manually_add_product_links(product_collection, access_link, documentation_link, item_link,)

## 4. Store your results in Long-Term Repository provided by ESA

**To store your data along with just created metadata of your product, you need to provide us access to the data itself.** <br> We strongly prefer data in [cloud-optimised formats](https://esa-earthcode.github.io/documentation/Community%20and%20Best%20Practices/Data%20and%20Workflow%20Best%20Practices/Data/), since it makes storage and access much easier. In addition, please provide us following information: 
- the total size of the data
- the data format
- whether you plan on updating it, and with what frequency


If you already have access and would like to maintain your dataset in the **public, long-term storage repository** such as zenodo, or pangea, please specify the **`"Access"`** link already in the metadata description of your product, by providing a **valid URL** in **Step 2** of this notebook.

## 5. `workflow` / code metadata
We also strongly encourage projects to add information about the workflow/code used to create the product to make the outputs fully reproducible. <br>
`Workflows` are defined as the code and workflows associated with a project, that have been used to generate a specific product. Workflows follow OGC record specifications in contrast to OSC Projects and Products entries. However, the metadata of a workflow is also expressed in JSON format. <br>
To discover the specification used in the workflows, explore the documentation here: [https://esa-earthcode.github.io/tutorials/osc-pr-manual/#id-2-3-add-new-workflow](https://esa-earthcode.github.io/tutorials/osc-pr-manual/#id-2-3-add-new-workflow) <br> 

**LICENSE**: In this step you are required to select one of the available licenses for each of your product.
Please have a look at available list of license and pick the one that defines your datasets:  [osc-licence schemas](https://github.com/ESA-EarthCODE/open-science-catalog-validation/blob/main/schemas/license.json).

>*If you have a workflow with non-defined license, we cannot proceed with publishing the workflow. Please specify 'various' license if you have doubts and need more time to assign correct license but want to publish the workflow and contact EarthCODE team.*

In [5]:
# BASIC INFORMATION ABOUT THE WORKFLOW
# A custom id of the workflow (must be different from project and product!), it can be related to the title, i.e. - world-cereal-algorithm. 
#Use dash "-" symbol to separate words in the id"
workflow_id = ""
workflow_title=""
workflow_description= "" 
# Define at most five keywords for the workflow. You can use any short text, that allow users to discover your workflow.
workflow_keywords= ["", ""] 
# Define the license of the workflow. i.e. CC-BY-4.0. See the note in the markdown cell above to consult full list of available licenses. 
# If you have multiple licenses, you can pick 'various'
workflow_license = 'CC-BY-4.0' 

# what DATA the workflow takes as input and output, i.e. GeoTIFF, Netcdf
workflow_formats = ['netcdf64']

# Define which project the workflow is associated with
# if are adding to an existing project see the id and titles from here:
# - https://github.com/ESA-EarthCODE/open-science-catalog-metadata/projects/
#These must match the new or an already existing project in the catalog! Alteratively correct links cannot be produced!  
project_id = ""
project_title = ""


# Define product themes i.e. land. Pick one or more from:
# - atmosphere, cryosphere, land, magnetosphere-ionosphere, oceans, solid-earth.
# See the list here: https://github.com/ESA-EarthCODE/open-science-catalog-metadata/blob/main/themes/catalog.json
workflow_themes = ['']


# List the contacts in a tuple with format (name, contact_email), for example - ('Magellium', "contact@magellium.fr")
workflow_contracts_info = [('Magellium', "contact@magellium.fr")]

# Define the access to the repository where the workflo/code can be discovered. Provide an active URL below: i.e. https://github.com/ESA-EarthCODE/open-science-catalog-metadata
codeurl = ''

# validate the collection
workflow_collection.validate()
workflow_collection

In [None]:
# Save the product and project collection locally (OPTIONAL)
workflow_collection.save_object(dest_href='./workflow_collection.json')  # rename your file and set the output path according to your needs

In [None]:
from pathlib import Path
from earthcode.git_add import save_product_collection_to_catalog
from earthcode.git_add import save_project_collection_to_osc

# Specify the absolute path to the local OSC fork
catalog_root = Path('C:/Users/ewelina.dobrowolska/Documents/open-science-catalog-metadata/')

# save the project entry and add the required links
save_project_collection_to_osc(project_collection, catalog_root)

# save the product and add the required links
save_product_collection_to_catalog(product_collection, catalog_root)

## Next steps: 

Once, you have provided your inputs: 
 1. Open a pull request using the Pull Request. Using this notebook (see sections below), or other methods, as described here: [https://esa-earthcode.github.io/tutorials/index-2/](https://esa-earthcode.github.io/tutorials/index-2/)
 2. Send us an email (mailto:earthcode@esa.int) **with your ESA TO in cc**. In the email body:
    - confirm that the **ESA TO have signed off on your product/project**
    - confirm the **license** for the data product and/or workflow
 3. Provide us access to the data (see step 2.1)  

Alternatively, you can also send us the metadata, and notebook/link to the notebook if you have questions via email.


# Section 2. Add new entries to Open Science Catalogue with Pull Request (GitHub)

This notebook is an example of how to add a new entry - i.e. `product, project, workflow` - to the Open Science Catalogue (OSC) via GitHub Pull Request. 
This can be done using [Graphical User Interface](https://esa-earthcode.github.io/tutorials/git-clerk-example/) within [EarthCODE workspace](https://workspace.earthcode.eox.at/), manual creation of the entries using web browser on GitHub, or by using platforms-specific tools. 
This notebook covers steps on how publishing can be done, locally, without a need of supporting platform or installation of the additional tools. <br> This document covers following steps:
1. Forking a local copy of the OSC
2. Embedding new OSC entries into the Catalog
3. Open Pull Request to add new entries
4. What is next? 

> To proceed with this step, **you need an active GitHub account**. If you do not have one, please [create an account](https://github.com/signup) to get started first. 

## 1. Setup a local Copy of the OSC

You can add new content to the OSC via GitHub [Pull Request](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/creating-a-pull-request). To do this, you need a to [fork](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/about-forks) the OSC repository, embeded the new information into the existing catalog and merge. The steps below describe the process.

0. (if needed) [Install git](https://github.com/git-guides/install-git) & create a [GitHub account](https://docs.github.com/en/get-started/start-your-journey/creating-an-account-on-github) 
1. [Fork](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/fork-a-repo) the open science catalog repository on github - [https://github.com/ESA-EarthCODE/open-science-catalog-metadata](https://github.com/ESA-EarthCODE/open-science-catalog-metadata)
2. [Clone](https://docs.github.com/en/repositories/creating-and-managing-repositories/cloning-a-repository) your *forked* repository
   `git clone https://github.com/,your-gh-username./open-science-catalog-metadata.git`
3. Set the current workspace to your **local clone** of the open science catalog metadata repository.
   `cd ./open-science-catalog-metadata/`
5. Create a new branch in the local clone
   `git checkout -b project_branch`

## 2. Embedding your newly created entries to the local copy of the open-science-catalog-metadata repository
All OSC entries interlinked to enable efficient search and analysis. For example, projects have associated products, themes, missions and in turn products link back to their projects, etc. Most of these can be automatically generated using the existing information in an OSC Entry and the associated `earthcode` library function.

To use these functions you need a **local copy of the OSC**, preferably a fork, so that later, you can easily open a PR. The functions will save your newly created OSC entries and make changes to existing OSC entries, in order to conform to the required structure.


In [None]:
from pathlib import Path
from earthcode.git_add import save_product_collection_to_catalog
from earthcode.git_add import save_project_collection_to_osc

# Specify the absolute path to the local OSC fork
catalog_root = Path('C:/Users/<your-username>/Documents/open-science-catalog-metadata/') # This should be a path to the local copy of the open-science-catalog-metadata repository

# save the project entry and add the required links
save_project_collection_to_osc(project_collection, catalog_root)

# save the product and add the required links
save_product_collection_to_catalog(product_collection, catalog_root)

# save the workflow and add the required links - leave this cell with the comment to disable formation of the workflow if you have none 
# save_workflow_collection_to_catalog(workflow_collection, catalog_root)

## 3. Validation

There will be two types of checks before accepting your entry into the main OSC:
1. Automatic verification
2. Semantic validation


Before doing any of the checks you need to store your entries on disk in the OSC directory. This is required in order to check that all links are generated correctly. You can see the results of the automatic checks and any potential error using the library.

In [5]:
from earthcode.validator import validateOSCEntry
validateOSCEntry(project_collection.to_dict(), catalog_root)

[]

In [6]:
validateOSCEntry(product_collection.to_dict(), catalog_root)

[]

## 4. Open a PR to add new entries

After the validation passes, you are ready to request changes into existing [open-science-catalog-metadata](https://github.com/ESA-EarthCODE/open-science-catalog-metadata) repository to be able to publish your datasets and project. 
By using the terminal: 
1. Commit the changes to the newly created branch on your local copy of repository: <br>
   `cd ./open-science-catalog-metadata` <br>
   `git checkout -b \<branch-name\>`<br>
   `git commit -m"Adding new product\_v2.0"`<br>
3. [Push](https://docs.github.com/en/get-started/using-git/pushing-commits-to-a-remote-repository) the changes to your fork:<br>
   `git push --set-upstream origin \<branch-name\>`
5. Open a [pull request](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/creating-a-pull-request) against the main open science catalog repository<br>
   `gh pr create -f`

## 5. Check the status of your PR direclty in GitHub

After creation of Pull Request you should see it on the list: https://github.com/ESA-EarthCODE/open-science-catalog-metadata/pulls

Check the status of your PR under: https://github.com/ESA-EarthCODE/open-science-catalog-metadata/actions

> Changes to the OSC content will be reviewed by the EarthCODE Data Steward team. In case any changes are needed to your inputs, you will be contacted by the team.