## Exploration of AI4Life platfrom to extract metadata

+ Upon reviewing available resources, it appears that BioModelZoo does not maintain a centralized JSON index or API for accessing RDF YAML files.

+ These rdf.yaml files adhere to a standardized format defined by the bioimage.io community, facilitating consistent descriptions of bioimage models and resources. While the repository provides example files, it does not offer a centralized JSON index listing all available rdf.yaml files.​

+ To programmatically access and work with these files, you can utilize the `bioimageio.core` Python package. This package provides functionalities to load, validate, and interact with BioImage.IO resources.

+ Download Files Programmatically: While there's no centralized JSON index, one can programmatically access and download these files by scraping the website or using available tools.

## Implementation

In [2]:

!pip install bioimageio.core


Collecting bioimageio.core
  Obtaining dependency information for bioimageio.core from https://files.pythonhosted.org/packages/32/33/10262047b6dc5dfea11571009135413f473201a42d077e7e57551107abf4/bioimageio_core-0.8.0-py3-none-any.whl.metadata
  Downloading bioimageio_core-0.8.0-py3-none-any.whl.metadata (23 kB)
Collecting bioimageio.spec==0.5.4.1 (from bioimageio.core)
  Obtaining dependency information for bioimageio.spec==0.5.4.1 from https://files.pythonhosted.org/packages/c0/b5/0e199ece54e44e9183b117cad9676398bcdeb1c5b8e6554aa44e223dbb74/bioimageio.spec-0.5.4.1-py3-none-any.whl.metadata
  Downloading bioimageio.spec-0.5.4.1-py3-none-any.whl.metadata (11 kB)
Collecting h5py (from bioimageio.core)
  Obtaining dependency information for h5py from https://files.pythonhosted.org/packages/25/61/d897952629cae131c19d4c41b2521e7dd6382f2d7177c87615c2e6dced1a/h5py-3.13.0-cp312-cp312-win_amd64.whl.metadata
  Downloading h5py-3.13.0-cp312-cp312-win_amd64.whl.metadata (2.5 kB)
Collecting imagecod


[notice] A new release of pip is available: 23.2.1 -> 25.0.1
[notice] To update, run: python.exe -m pip install --upgrade pip


### Enable pretty_validation_errors

In [33]:
from bioimageio.spec.pretty_validation_errors import (
    enable_pretty_validation_errors_in_ipynb,
)

enable_pretty_validation_errors_in_ipynb()

In [37]:
import bioimageio.core
from bioimageio.core import load_description


## Load model from the BioImage Model Zoo

bioimage.io resources may be identified via their bioimage.io ID, e.g. "affable-shark" or the DOI of their Zenodo backup.

Both of these options may be version specific ("affable-shark/1" or a version specific Zenodo backup DOI).

Alternatively, any rdf.yaml source, single file or in a .zip, may be loaded by providing its local path or URL.

In [5]:
BMZ_MODEL_ID = ""  # "affable-shark"
BMZ_MODEL_DOI = ""  # "10.5281/zenodo.6287342"
BMZ_MODEL_URL = "https://uk1s3.embassy.ebi.ac.uk/public-datasets/bioimage.io/affable-shark/1.2/files/rdf.yaml" 

In [8]:
from bioimageio.core import load_description

# Load the model description
# ------------------------------------------------------------------------------
if BMZ_MODEL_ID != "":
    model = load_description(BMZ_MODEL_ID)
    print(
        f"\nThe model '{model.name}' with ID '{BMZ_MODEL_ID}' has been correctly loaded."
    )
elif BMZ_MODEL_DOI != "":
    model = load_description(BMZ_MODEL_DOI)
    print(
        f"\nThe model '{model.name}' with DOI '{BMZ_MODEL_DOI}' has been correctly loaded."
    )
elif BMZ_MODEL_URL != "":
    model = load_description(BMZ_MODEL_URL)
    print(
        f"\nThe model '{model.name}' with URL '{BMZ_MODEL_URL}' has been correctly loaded."
    )
else:
    print("\nPlease specify a model ID, DOI or URL")

if "draft" in BMZ_MODEL_ID or "draft" in BMZ_MODEL_DOI or "draft" in BMZ_MODEL_URL:
    print(
        f"\nThis is the DRAFT version of '{model.name}'. \nDraft versions have not been reviewed by the Bioimage Model Zoo Team and may contain harmful code. Run with caution."
    )


The model 'NucleiSegmentationBoundaryModel' with URL 'https://uk1s3.embassy.ebi.ac.uk/public-datasets/bioimage.io/affable-shark/1.2/files/rdf.yaml' has been correctly loaded.


In [17]:
from bioimageio.core import test_model

test_summary = test_model(model)
test_summary.display()

[32m2025-04-23 09:57:43.985[0m | [34m[1mDEBUG   [0m | [36mbioimageio.core._resource_tests[0m:[36m_test_model_inference[0m:[36m593[0m - [34m[1mstarting 'Reproduce test outputs from test inputs (pytorch_state_dict)'[0m


[32m2025-04-23 09:57:44.231[0m | [1mINFO    [0m | [36mbioimageio.core._resource_tests[0m:[36m_test_model_inference_parametrized[0m:[36m705[0m - [1mTesting inference with 6 different inputs (B, N): {(1, 2), (2, 1), (1, 1), (2, 0), (2, 2), (1, 0)}[0m


[32m2025-04-23 09:57:44.401[0m | [34m[1mDEBUG   [0m | [36mbioimageio.core._resource_tests[0m:[36m_test_model_inference[0m:[36m593[0m - [34m[1mstarting 'Reproduce test outputs from test inputs (torchscript)'[0m


[32m2025-04-23 09:57:44.704[0m | [1mINFO    [0m | [36mbioimageio.core._resource_tests[0m:[36m_test_model_inference_parametrized[0m:[36m705[0m - [1mTesting inference with 6 different inputs (B, N): {(1, 2), (2, 1), (1, 1), (2, 0), (2, 2), (1, 0)}[0m


❌,bioimageio format validation
status,failed
source,https://uk1s3.embassy.ebi.ac.uk/public-datasets/bioimage.io/affable-shark/1.2/files/rdf.yaml
id,10.5281/zenodo.5764892/6647674
format version,model 0.5.4
bioimageio.spec,0.5.4.1
bioimageio.core,0.8.0

Unnamed: 0,Location,Details
✔️,,Successfully created `ModelDescr` object.
🔍,`context:perform_io_checks`,True
🔍,`context:root`,https://uk1s3.embassy.ebi.ac.uk/public-datasets/bioimage.io/affable-shark/1.2/files
🔍,`context:known_files.zero_mean_unit_variance.ijm`,767f2c3a50e36365c30b9e46e57fcf82e606d337e8a48d4a2440dc512813d186
🔍,`context:known_files.test_input_0.npy`,c29bd6e16e3f7856217b407ba948222b1c2a0da41922a0f79297e25588614fe2
🔍,`context:known_files.sample_input_0.tif`,a24b3c708b6ca6825494eb7c5a4d221335fb3eef5eb9d03f4108907cdaad2bf9
🔍,`context:known_files.test_output_0.npy`,510181f38930e59e4fd8ecc03d6ea7c980eb6609759655f2d4a41fe36108d5f5
🔍,`context:known_files.sample_output_0.tif`,e8f99aabe8405427f515eba23a49f58ba50302f57d1fdfd07026e1984f836c5e
🔍,`context:known_files.unet.py`,7f5b15948e8e2c91f78dcff34fbf30af517073e91ba487f3edb982b948d099b3
🔍,`context:known_files.weights.pt`,608f52cd7f5119f7a7b8272395b0c169714e8be34536eaf159820f72a1d6a5b7


In [52]:
from pathlib import Path
import yaml

output_folder = Path("output")
output_folder.mkdir(exist_ok=True)

print(f"Folder '{output_folder}' created (or already exists)")

# # 2. Define the full path for the new YAML file
# yaml_file_path = output_folder / "my_data.yaml"


Folder 'output' created (or already exists)


### Inspect the model metadata

In [57]:
from bioimageio.core  import load_description

# URL or RDF (Resource Description File) of a model
model_url = "https://uk1s3.embassy.ebi.ac.uk/public-datasets/bioimage.io/affable-shark/1.2/files/rdf.yaml"  # example model

# Load the resource description
rdf = load_description(model_url)
# pprint(rdf)


# Print some metadata
print('____________________________________________________________________________________')
print("Model Name:", rdf.name)
print("Description:", rdf.description)
print("Inputs:", rdf.inputs)
print("Outputs:", rdf.outputs)
print('_________________________________________________________________________________________________________________________________')



____________________________________________________________________________________
Model Name: NucleiSegmentationBoundaryModel
Description: Nucleus segmentation for fluorescence microscopy
Inputs: [InputTensorDescr(id='input0', description='', axes=[BatchAxis(id='batch', description='', type='batch', size=None), ChannelAxis(id='channel', description='', type='channel', channel_names=['channel0']), SpaceInputAxis(size=ParameterizedSize(min=64, step=16), id='y', description='', type='space', unit=None, scale=1.0, concatenable=False), SpaceInputAxis(size=ParameterizedSize(min=64, step=16), id='x', description='', type='space', unit=None, scale=1.0, concatenable=False)], test_tensor=FileDescr(source=RelativePath('test_input_0.npy'), sha256='c29bd6e16e3f7856217b407ba948222b1c2a0da41922a0f79297e25588614fe2'), sample_tensor=FileDescr(source=RelativePath('sample_input_0.tif'), sha256='a24b3c708b6ca6825494eb7c5a4d221335fb3eef5eb9d03f4108907cdaad2bf9'), data=IntervalOrRatioDataDescr(type='floa

In [39]:
from bioimageio.core import load_description
from bioimageio.core import load_model_description
import random

# List of known model URLs or paths to their rdf.yaml files
model_urls = [
   "https://uk1s3.embassy.ebi.ac.uk/public-datasets/bioimage.io/affable-shark/1.2/files/rdf.yaml",
   "https://uk1s3.embassy.ebi.ac.uk/public-datasets/bioimage.io/chatty-frog/1/files/rdf.yaml",
"https://uk1s3.embassy.ebi.ac.uk/public-datasets/bioimage.io/hiding-tiger/1.1/files/rdf.yaml"
    # Add more model URLs as needed
]
#File will be automatically downloaded to 'bioimageio\Cache' folder.

# Select 3 random models
random_models = random.sample(model_urls, 3)

# Load and validate each model's rdf.yaml

for url in random_models:
    rdf = load_description(url)
    rdf_model = load_model_description(url)
    print('_________________________________________________________________________________________________________________________________')
    print("Model Name:", rdf.name)
    print("Model id:", rdf_model.id)
    print("Authors:", rdf.authors)
    print("Cite:", rdf.cite)
    print("Config:", rdf.config)
    print("Description:", rdf.description)
    print("Inputs:", rdf.inputs)
    print("Outputs:", rdf.outputs)
    print("Licence:", rdf.license)
    print("Links:", rdf.links)
    print('_________________________________________________________________________________________________________________________________')


_________________________________________________________________________________________________________________________________
Model Name: NucleiSegmentationBoundaryModel
Model id: 10.5281/zenodo.5764892/6647674
Authors: [Author(affiliation='EMBL Heidelberg', email=None, orcid=None, name='Constantin Pape', github_user='constantinpape')]
Cite: [CiteEntry(text='training library', doi='10.5281/zenodo.5108853', url=None), CiteEntry(text='architecture', doi='10.1007/978-3-319-24574-4_28', url=None), CiteEntry(text='segmentation algorithm', doi='10.1038/nmeth.4151', url=None), CiteEntry(text='data', doi=None, url='https://www.nature.com/articles/s41592-019-0612-7')]
Config: bioimageio=BioimageioConfig(reproducibility_tolerance=(), nickname='affable-shark', nickname_icon='🦈', thumbnails={'cover.png': 'cover.thumbnail.png'}) _conceptdoi='10.5281/zenodo.5764892' deepimagej={'allow_tiling': True, 'model_keys': None, 'prediction': {'postprocess': [{'spec': None}], 'preprocess': [{'kwargs': 'ze



_________________________________________________________________________________________________________________________________
Model Name: StarDist H&E Nuclei Segmentation
Model id: 10.5281/zenodo.6338614/6338615
Authors: [Author(affiliation=None, email=None, orcid=None, name='Uwe Schmidt', github_user='uschmidt83'), Author(affiliation=None, email=None, orcid=None, name='Martin Weigert', github_user='maweigert')]
Cite: [CiteEntry(text='Cell Detection with Star-Convex Polygons', doi='10.1007/978-3-030-00934-2_30', url=None), CiteEntry(text='Star-convex Polyhedra for 3D Object Detection and Segmentation in Microscopy', doi='10.1109/WACV45572.2020.9093435', url=None)]
Config: {'bioimageio': {'nickname': 'chatty-frog', 'nickname_icon': '🐸', 'thumbnails': {'example_histo.jpg': 'example_histo.thumbnail.png', 'stardist_logo.jpg': 'stardist_logo.thumbnail.png'}}, 'deepimagej': {'allow_tiling': True, 'model_keys': None, 'prediction': {'postprocess': [{'spec': None}], 'preprocess': [{'kwargs'

In [None]:
# import yaml
# from rdflib import Graph, Namespace, URIRef, Literal
# from rdflib.namespace import RDF, RDFS

# # Step 1: Load RDF YAML file
# with open("model_rdf.yaml", "r") as f:
#     rdf_yaml = yaml.safe_load(f)

# # Step 2: Create RDF Graph
# g = Graph()

# # Define custom namespace (optional - replace with actual if known)
# BIOIMAGE = Namespace("https://bioimage.io/")

# # Example: Convert some fields to RDF triples (you'll need to adapt this to your RDF spec)
# model_uri = URIRef(rdf_yaml.get("id", "https://bioimage.io/example-model"))

# g.add((model_uri, RDF.type, BIOIMAGE.Model))
# g.add((model_uri, RDFS.label, Literal(rdf_yaml.get("name", "Unknown model"))))

# # Add authors (if available)
# authors = rdf_yaml.get("authors", [])
# for author in authors:
#     g.add((model_uri, BIOIMAGE.author, Literal(author.get("name", "Unknown Author"))))

# # Add more fields as needed...

# # Step 3: Serialize to Turtle format
# g.serialize(destination="model.ttl", format="turtle")

# print("Conversion complete! Turtle file saved as 'model.ttl'")
