# Evaluation Setup

The following setup is based on our methodology described in the paper
_“Towards Standardization of the Earth Observation Data Product Supply Chain – Are OCI Artifacts the Key to Ubiquitous and Scalable EO Data Handling?”_

## Prerequisites

Before getting started, ensure the following tools are installed:

- [ORAS CLI](https://oras.land/)
- `tar`, `tree`, and `jq` (available on most Unix-like systems)
- [Docker](https://www.docker.com/) (if you plan to run a local OCI registry)

To demonstrate the ubiquity and usability of OCI registries in real-world scenarios, we evaluated the following five registries:

- Docker Hub: `docker.io/versioneer` see https://hub.docker.com/repositories/versioneer?search=pastis
- Quay.io: `quay.io/versioneer-inc` see https://quay.io/repository/versioneer-inc/pastis-2433?tab=tags resp. https://quay.io/repository/versioneer-inc/pastis-t4?tab=tags
- Harbor (hosted by OVHCloud): `qr2wz4td.c1.de1.container-registry.ovh.net/versioneer` private
- Amazon Elastic Container Registry (ECR): `767397985165.dkr.ecr.eu-central-1.amazonaws.com/versioneer`  private
- Docker Registry (local): `localhost:5000` based on the open-source [distribution](https://github.com/distribution/distribution) reference implementation to run locally

The repositories on Docker Hub and Quay.io are publicly accessible in read-only mode, so you can directly inspect the used evaluation artifacts. Harbor and AWS ECR are private instances, so you will need your own cloud subscription and authentication credentials if you want to use them. The local Docker registry is the most straightforward option to follow up. You can start it with:

```bash
docker run -d -p 5000:5000 --name registry registry:2
```

> Note: You can also run other OCI-compliant registries such as [Harbor](https://github.com/goharbor/harbor) or [Quay](https://github.com/quay/quay) locally.

For Python dependencies a requirements file is provided for convenience:

```bash
pip install -r requirements.txt
```

## Reference Dataset and Partitioning Strategy

We used the _Panoptic Agricultural Satellite Time Series_ ([PASTIS](https://github.com/VSainteuf/pastis-benchmark)) dataset as our reference, as it integrates diverse Earth Observation (EO) modalities, including:

- Optical time-series data from Sentinel-2  
- Radar time-series data from Sentinel-1  
- Very High Resolution (VHR) imagery from SPOT satellites  
- Curated annotations, including label masks and semantic classifications  

Rather than preserving the original organization—which grouped data by source and required consumers to search and filter for relevant information—we restructured the dataset into an analysis-ready format using two distinct partitioning schemes:

- PASTIS-2433: The entire PASTIS dataset is split into 2,433 individual per-patch subsets. Each patch is packaged into a separate TAR archive and added as a layer within a single OCI artifact. The final artifact includes 2,433 layers, each approximately 30–35 MB as well as a [config](data/config-2433.json) object that describes the metadata for each patch. This layout enables fine-grained access and maximizes deduplication across patches.

- PASTIS-t4: The dataset is instead divided into four larger spatial tiles. Each tile represents a distinct region and is packaged into a TAR archive, then added as a layer in the OCI artifact. This results in 4 layers, each approximately 15–20 GB and a [config](data/config-t4.json) object that captures tile-level metadata. This approach enables high-throughput data access optimized for regional analysis.

For our own convenience, the TAR archives (both the 2,433 patches and the 4 tiles) were uploaded to an object storage bucket on OVHCloud. These were used as the source to package the evaluation OCI artifacts before pushing them to various registries.

The used partitioning scripts are available here:

- [scripts/0_initial-partioning-2433.ipynb](scripts/0_initial-partioning-2433.ipynb)
- [scripts/0_initial-partitioning-t4.ipynb](scripts/0_initial-partitioning-t4.ipynb)

The scripts used to generate the config files—required for building the OCI artifacts and pushing them to a registry—are available here:

- [scripts/1_generate-config-2433.ipynb](scripts/1_generate-config-2433.ipynb)
- [scripts/1_generate-config-t4.ipynb](scripts/1_generate-config-t4.ipynb)

The original spatial metadata for the PASTIS dataset is provided as a [GeoJSON file](data/metadata_pastis.geojson) in the `data/` folder.

> Note: We’ve created a small sample of PASTIS-2433 with just 3 layers and a matching [config file](sample/config.json). It’s located in the `sample/` folder to help you quickly explore the data and get started.

## Packaging and Publishing EO Patches

The following code demonstrates how to package the small sample of the PASTIS-2433 dataset and push it to a local OCI registry at `localhost:5000/pastis-2433:sample`.

For comparison, equivalent Bash scripts are available in the `scripts/` directory, which use the ORAS CLI to perform the same steps. For example, see [2a_push_dockerhub-2433.sh](/scripts/2a_push_docker)

In [6]:
import os
import re
import subprocess

data_dir = "sample"
registry = "localhost:5000"
repo = f"{registry}/pastis-2433:sample"

print(f"Preparing to push to {repo}")

layers = []
tar_pattern = re.compile(r"^\d{5}\.tar$")
for filename in os.listdir(data_dir):
    if tar_pattern.match(filename):
        full_path = os.path.join(data_dir, filename)
        layers.append(f"{full_path}:application/vnd.oci.image.layer.v1.tar")

if not layers:
    print("No valid layers found")
    exit(1)

print(f"Found {len(layers)} layer(s):")
for layer in layers:
    print("  -", layer.split(":")[0])

config_path = os.path.join(data_dir, "config.json")
cmd = [
    "oras", "push", "--verbose", repo,
    "--artifact-type", "application/vnd.whatever.v1+tar",
    "--config", f"{config_path}:application/vnd.oci.image.config.v1+json"
] + layers

subprocess.run(cmd, check=True)


Preparing to push to localhost:5000/pastis-2433:sample
Found 3 layer(s):
  - sample/10000.tar
  - sample/10001.tar
  - sample/10002.tar
Preparing sample/10000.tar
Preparing sample/10001.tar
Preparing sample/10002.tar
Uploading 6f57fa9c759f sample/10002.tar
Uploading b9af2a69dee3 sample/10001.tar
Uploading 7cff937ff47c sample/10000.tar
Uploading e4c1009e385d application/vnd.oci.image.config.v1+json
Uploaded  e4c1009e385d application/vnd.oci.image.config.v1+json
Uploaded  6f57fa9c759f sample/10002.tar
Uploaded  7cff937ff47c sample/10000.tar
Uploaded  b9af2a69dee3 sample/10001.tar
Uploading 72bf0b123756 application/vnd.oci.image.manifest.v1+json
Uploaded  72bf0b123756 application/vnd.oci.image.manifest.v1+json
Pushed [registry] localhost:5000/pastis-2433:sample
ArtifactType: application/vnd.whatever.v1+tar
Digest: sha256:72bf0b123756669a8b9b34dfd4beb898dc9ab1eb7171bcb804de9d38e1371c9c


CompletedProcess(args=['oras', 'push', '--verbose', 'localhost:5000/pastis-2433:sample', '--artifact-type', 'application/vnd.whatever.v1+tar', '--config', 'sample/config.json:application/vnd.oci.image.config.v1+json', 'sample/10000.tar:application/vnd.oci.image.layer.v1.tar', 'sample/10001.tar:application/vnd.oci.image.layer.v1.tar', 'sample/10002.tar:application/vnd.oci.image.layer.v1.tar'], returncode=0)