This document describes a manifest file accompanying each containerized software algorithm (a computational tool) in order to
- make computational tools interoperable with other tools in terms of their inputs and outputs,
- chain multiple tools into computational workflows to perform complex computations, and
- execute workflows in distributed computational environments, such as computer clusters, computer clouds, and high-performance computing (HPC) environments. Overall, the goal is to create Findable, Accessible, Interoperable, and Reusable (FAIR) Containerized Computational Software (FAIR-CCS).
Computational Software is interchangeably used with the word tool or algorithm (computational tool or algorithm). In the context of interoperable computational software, computational tool or algorithm is also denoted as a plugin (computational plugin) since it is plugged into a chain of algorithms (i.e., a computational workflow) based on its interoperability property.
This repository for the specification of a manifest file consists of:
schemafolder: JSON schema with all supported fields (entries)- The schema for a manifest file consists of sections describing inputs, outputs, UI, and resource requirements. The sections for
inputsandoutputsallow chaining containers’ inputs and outputs. The sectionuiallows on-the-fly generation of web user interface for collecting input arguments for running an application packaged in the container. The sectionresourceRequirementsallows schedulers of container-based workflows to optimally choose computational nodes for running containers on distributed computational resources.
- The schema for a manifest file consists of sections describing inputs, outputs, UI, and resource requirements. The sections for
docsfolder: Documentation about each field in a manifest file and general guidelines for building interoperable containerized computational toolssample-toolsfolder: Image thresholding and cropping algorithms packaged into interoperable containerized toolsrequest-for-feedbackfolder: A list of questions about the manifest file to provide feedback on
A prototype of a container manifest was designed and tested by the Web Image Processing Pipelines project developed at NIST. The discussions about specifications of a container manifest file began at the 1st workshop on Interoperability of Web Computational Plugins for Large Microscopy Image Analyses URL. The workshop report can be found at this URL. Additional contributions to the specifications of a container manifest file came from the Polus-AI project developed at NCATS NIH.
- With the increasing size of collected data, distributed computational environments provide an acceleration option for completing data analyses over very large data collections and for federated learning over many data collections.
- In order to run heterogeneous analysis tools written in multiple programming languages and with many dependencies on other software libraries, containerization of tools offers a valuable solution for software execution in distributed computational environments with heterogeneous hardware and software configuration at each computational node.
- To facilitate reuse of tools and creations of increasingly complex computational analyses (workflows), containerized software tools must be interoperable as they are chained into workflows. The motivation behind this manifest specification lies in defining fields describing each containerized software tool so that the tools can be chained into workflows and executed in distributed computational environments.
The initial application use cases come from biomedical microscopy imaging domain since the advancements in microscope designs and acquisition automations have enabled generating terabyte-sized image collections in a relative short time spans. Examples of existing software tools for microscopy imaging use cases can be found in the GitHub repositories at NIST and at NCATS NIH. The software tools can also be searched and found via a tool registry, currently available for NIST tools at this URL.
Other application use cases can be supported, for example, chemistry analyses, molecular modeling, genomics, or bioinformatics. The manifest specification is mainly focused on information pertinent to container execution and chaining into workflows (while being agnostic to the application context of container execution).
-
Containerized Tool Manifest - JSON schema that defines manifest file entries
-
- Manifest Schema documentation - In-depth documentation of the manifest JSON schema
- Best practices - Best practices and guidelines for building interoperable containerized tools
-
Online Manifest Generator - Online creation and validation of manifest files
-
Simple examples of interoperable containerized tools
- Basic thresholding in Python - Example of interoperable containerized tool for image thresholding
- Image crop operation in Python - Example of interoperable containerized tool for image cropping
- Under construction
Please, do not hesitate to send email to our team if the current specification should be modified to meet your needs
